[OT?] Uniscribe for Malayalam and Oriya

2004-12-21 Thread Marco Cimarosti
see the glyphs come out correctly. As not even *reordering* is done, I guess that my Uniscribe DLL does not support these scripts. Are they implemented in newer versions of Uniscribe? If yes, where can I get it? Thanks in advance for any help. -- Marco Cimarosti

[OT] The nice thing about standards...

2004-10-20 Thread Marco Cimarosti
Hallo everybody! I received this in the mail, and I thought it could be of interestfor the Unicode mailing lits: Aragonese - Lo geno d'as normas ye que aiga tantas entre ras que se puede eslexir. Asturian - Lo bono de les normes ye qu'hai munches onde escoyer. Basque - Arauen alderik onena da

RE: UTF to unicode conversion

2004-06-30 Thread Marco Cimarosti
Mike Ayers wrote: Side 1 (print and cut out): ++---+---+--+ | U+ | yy zz |Cima's UTF-8 Magic | Hex= | | U+007F | ! ! |Pocket Encoder | B-4 | | YZ | . . | | |

RE: lines 05-08, version 4.7 of Roadmap to BMP and 'Hebrew extens ions'

2004-06-28 Thread Marco Cimarosti
Rick McGowan wrote: I mistakenly thought Tifinagh was rtl. That's OK. It has been, and sometimes still is, written right to left, hence it was roadmapped in a right-to-left allocation block. However, in modern usage, and in the Moroccan national standard now being drafted, it is

RE: Bob Bemer, father of ASCII, has died

2004-06-25 Thread Marco Cimarosti
[\]{}

RE: Latin long vowels

2004-06-23 Thread Marco Cimarosti
Anto'nio Martins-Tuva'lkin wrote: On 2004.06.22, 16:20, Marco Cimarosti wrote: You can also compose them with the normal letter followed by character MODIFIER LETTER MACRON (code 02C9, decimal 713). Oops! You mean U+0304 : COMBINING MACRON (decimal: 772). Yes, right, sorry. (Hey

RE: Latin long vowels

2004-06-22 Thread Marco Cimarosti
Joe Speroni wrote: I apologize for a simple question, but after a few hours of research I don't seem to be able to find the characters needed. Funny: I see them in my Windows Character Map utility at the first hit on Page Down key... I'm trying to scan a Latin text that uses a bar over the

RE: [OT] Even viruses are now i18n!

2004-04-23 Thread Marco Cimarosti
Antoine Leca wrote: The virus cannot have any knowledge of a language code. And much less of the language used by its next victim... It sends e-mails to addresses stolen from the previous victim's address list, so it can analyze the top-level domain of these addresses (.it, .fr, etc.).

[OT] Even viruses are now i18n!

2004-04-22 Thread Marco Cimarosti
It seems that even the virus industry is getting global! F-Secure Virus Descriptions : NetSky.X [...] Netsky.X sends messages in several different languages: English, Swedish, Finnish, Polish, Norwegian, Portuguese, Italian, French, German and possibly the language of some small island called

RE: [OT] Even viruses are now i18n!

2004-04-22 Thread Marco Cimarosti
Peter Kirk wrote: mutlu etmek okumak belgili tanimlik belge. ... This is Turkish, of a sort. The virus writers have presumably confused .tc and .tk, as this Turkish is the first body listed and .tc is the first domain listed. Yes, and the translation was probably done translating word by

RE: help finding radical/stroke index at unicode.org

2004-04-15 Thread Marco Cimarosti
Gary P. Grosso wrote: Judging by what we saw in the back of the Unicode 2.0 book, we would tend to say that it is correct that (in an index) 21333 (0x5355) is sorting under 21313 (0x5341) instead of 20843 (0x516b). I am looking for some table of radicals that I can show our customer to help

RE: Unicode 4.0.1 Released

2004-03-31 Thread Marco Cimarosti
Rick McGowan wrote: Unicode 4.0.1 has been released! [...] The main new features in Unicode 4.0.1 are the following: [...] 3. Unicode Character Database: [...] * Changed: general category of U+200B ZERO WIDTH SPACE * Changed: bidi class of several characters (If I am asking a

RE: help needed with adding new character

2004-03-19 Thread Marco Cimarosti
Michael Everson wrote: What organization uses the ANARCHY SYMBOL? ;-) The anarchist movement. Why are you winking? Ciao. Marco

[OT] Freedom and organization (was RE: help needed with adding ne w character)

2004-03-19 Thread Marco Cimarosti
Kenneth Whistler wrote: Why is an Anarchist asking to standardize something? Why not!? Can you elaborate on this? Myself, I am an anarchist sympathizer, and I have been deeply interested in a character encoding standard for nearly ten years now... Anarchism is against imposing forms of

RE: help needed with adding new character

2004-03-19 Thread Marco Cimarosti
Jon Wilson wrote: I disagree that the anarchy symbol is not a character used in the representation of words. I can write a word beginning with A with either a simple LATIN CAPITAL LETTER A, or with an Anarchy symbol, or with an existing CIRCLED LATIN CAPITAL LETTER A. You can also write an

RE: [OT] Freedom and organization (was RE: help needed with addin g ne w character)

2004-03-19 Thread Marco Cimarosti
Peter Kirk wrote: Come to think of it, a not very large group of them with a bit of money behind them could buy enough votes to outvote the corporations and destroy Unicode - Yes, right, interesting possibility! Not that much money either: a single punk rock concert would probably raise

RE: OT? Languages with letters that always take diacriticals

2004-03-16 Thread Marco Cimarosti
Curtis Clark wrote: Are there any languages that use letters with diacriticals, but *never* use the base letter without diacriticals? AFAIK, Thaana is such a case. Unlike Indic scripts, Thaana has no inherent vowel, so each consonant letter always takes either a vowel mark or the sukuun (=

RE: Web Form: Other Question: Etruscan,Sanscrit Linear B on ibo ok G4

2004-01-28 Thread Marco Cimarosti
John Jenkins wrote: Anybody understand what he means by there is unicode gamma of characters but it is not complete? I guess unicode gamma of characters is Italinglish for Unicode character set. (Italian gamma means repertoire, range, scale, set.) _ Marco

RE: Unicode forms for internal storage - BOCU-1 speed

2004-01-23 Thread Marco Cimarosti
Jon Hanna wrote: I refuse to rename my UTF-81920! Doug, Shlomi, there's a new one out there! Jon, would you mind describing it? _ Marco

RE: Detecting encoding in Plain text

2004-01-13 Thread Marco Cimarosti
Peter Kirk wrote: This one also looks dangerous. What do you mean by dangerous? This is an heuristic algorithm, so it is only supposed to work always but only in some lucky cases. If lucky cases average to, say, 20% or less then it is a bad and useless algorithm; if they average to, say, 80% or

RE: Detecting encoding in Plain text

2004-01-13 Thread Marco Cimarosti
Jon Hanna wrote: False positives can be caused by the use of U+ (which is most often encoded as 0x00) which some applications do use in text files. I have never seen such a thing, can you make an example? I can't imagine any use for a NULL in a file apart terminating records or strings

RE: Detecting encoding in Plain text

2004-01-13 Thread Marco Cimarosti
Peter Kirk wrote: What do you mean by dangerous? This is an heuristic algorithm, so it is only supposed to work always [...] (I meant: it is not supposed to work always) I would not consider an 80% algorithm to be very good - depending on the circumstances etc. But if for example 20% of my

RE: Chinese rod numerals

2004-01-13 Thread Marco Cimarosti
Christopher Cullen wrote: (2) The Unicode home page says: The Unicode Standard defines codes for characters used in all the major languages [...] mathematical symbols, technical symbols, [...]. I suggest that in an enterprise so universal and cross-cultural as Unicode, the definition of what

RE: Detecting encoding in Plain text

2004-01-12 Thread Marco Cimarosti
Doug Ewell wrote: In UTF-16 practically any sequence of bytes is valid, and since you can't assume you know the language, you can't employ distribution statistics. Twelve years ago, when most text was not Unicode and all Unicode text was UTF-16, Microsoft documentation suggested a heuristic

RE: Punched tape (was: Re: American English translation of chara cter names)

2004-01-07 Thread Marco Cimarosti
Anto'nio Martins-Tuva'lkin wrote: |O OoOO | |O oOOO | | OOo O O| |OO oOO | | O o OO| |OO o| |O Oo OO | |O o| | OOo OOO| |O OoO | | OOo OOO| |O o OOO| «\N5l#oVO7X7G»? _ Marco

RE: Unicode-ASCII approximate conversion

2003-12-19 Thread Marco Cimarosti
Hallvard B Furuseth wrote: I need a function which converts Latin Unicode characters to the closest equivalent ASCII characters, e.g. é - e. Before I reinvent the wheel, does any public domain or GPL code for this already exist? I don't know, sorry. If not, for the most part I expect I

RE: American English translation of character names

2003-12-18 Thread Marco Cimarosti
John Cowan wrote: In the New York City subway system (of underground trains, that is, not underground pedestrian tunnels!), this letter has been consistently avoided since 1967, when the system of distinguishing trains by letter or number was instituted. The only other letters never used are

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Marco Cimarosti
Doug Ewell wrote: I'll go farther than that. It's always bothered me that speakers of European languages, including English but especially French, have seen fit to rename the cities and internal subdivisions of other countries. Rightly said! There is reason to rename Colonia to Kln, Augusta

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Marco Cimarosti
Michael Everson wrote: At 11:04 +0100 2003-12-17, Marco Cimarosti wrote: There is reason to rename Colonia to Köln, Augusta to Augsburg, Eboraco to York, Provincia to Provence, and so on. Nicely said. Subtle irony tends to go over some people's heads on this list though. Especially

RE: Arabic Presentation Forms-A

2003-12-17 Thread Marco Cimarosti
Philippe Verdy wrote: #code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?; # RIAL SIGN fdfc;;;isolated 0631 06cc 0627 0644; # ??; ?; ?; The Arial Unicode MS font does not have a glyph for the Rial currency sign so I won't comment lots about it, even if it's a special ligature of

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-15 Thread Marco Cimarosti
Doug Ewell wrote: This seems very misguided, if true. Alphabetical primacy can hardly be considered an effective measure of the relative power or importance of a nation. [...] Remember that in the time frame in question, the late '30s and early '40s, three of the major world powers were

[OT?] The C standard library and UTF's (was RE: Text Editors and Canonical Equivalence (was Coloured diacritics))

2003-12-12 Thread Marco Cimarosti
Tim Greenwood wrote: In my interpretation of the C standard (which I am reading from http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a valid wchar_t encoding if your execution character set contains characters outside the C0 controls and Basic Latin range, and UTF-16 is

RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

2003-12-09 Thread Marco Cimarosti
Hmm. Now here's some C++ source code (syntax colored as Philippe suggests, to imply that the text editor understands C++ at least well :enough to color it) int n = wcslen(Lcafé); (That's int n = wcslen(Lcafé); for those without HTML email) The L prefix on a string literal makes it a

RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

2003-12-09 Thread Marco Cimarosti
I (Marco Cimarosti) wrote: So, should n equal four or five? Why not six? ^^^ Errata: seven. If, in our C(++) compiler, type wchar_t is an alias for char, and wide character strings are encoded in UTF-8, and the é is decomposed, then n will be equal to 6

RE: Text Editors and Canonical Equivalence (was Coloured diacriti cs)

2003-12-09 Thread Marco Cimarosti
Peter Kirk wrote: So, should n equal four or five? The answer would appear to depend on whether or not the source file was saved in NFC or NFD format. No, surely not. If the wcslen() function is fully Unicode conformant, it should give the same output whatever the canonically

RE: [OT]

2003-12-09 Thread Marco Cimarosti
[...] some greedy investors turned it into a scam just for a quick buck (for surely it will be quick!) Sorry, I had to get that off my chest. Hopefully someone with some pull in Ireland will read this and do something about it :-) Or simply flush Guinne$$ and drink Murphix. :-) Ciao.

[OT] GB 18030 certification

2003-11-25 Thread Marco Cimarosti
I was wondering: what exactly does GB-18030 certification consists of? I guess that some tests done on the software, but what exactly? Also, where and who performs this certification? Does the Chinese government do it directly, or is it out-sourced to external agencies? Does this have to be in

RE: Problems encoding the spanish o

2003-11-17 Thread Marco Cimarosti
Pim Blokland wrote: Not only that, but the process making the mistake of thinking it is UTF-8 also makes the mistake of not generating an error for encountering malformed byte sequences, BTW, this process has a name: Internet Explorer. AND of outputting the result as two 16-bit numbers

RE: Tamil conjunct consonants (was: Encoding Tamil SRI)

2003-11-07 Thread Marco Cimarosti
Peter Jacobi wrote: IMHO this doesn't fit well actual Tamil use and raises a lot of practical problems. Either there must be an accepted list of these ligatures (but lists of archaic usage tend to grow), or one is bound to put a preemptive ZWNJ after every SHA VIRAMA in modern use, to

RE: Encoding Tamil SRI

2003-11-06 Thread Marco Cimarosti
Peter Constable wrote: Alternatives given were (0BB8)(0BCD)(0BB1)(0BC0) (0BB6)(0BCD)(0BB1)(0BC0) (if and when U+0BB6 becomes Unicode) (0B9A)(0BBF)(0BB1)(0BC0) Alternatives to what? The first and third sequence would have distinct appearances (see attached file), and would consistute

Re-distributing the files in http://www.unicode.org/Public/MAPPIN GS/VENDORS

2003-11-05 Thread Marco Cimarosti
the Unicode Consortium official, stating whether o not I am allowed to re-distribute the above described files in a commercial application? Thank you in advance. Regards. Marco Cimarosti (S3, Italy, http://www.essetre.it)

RE: GDP by language

2003-10-23 Thread Marco Cimarosti
Mark Davis wrote: Marco, I certainly wouldn't draw that conclusion. This is not the appropriate forum for a political or ethical discussion, Of course. I just noticed that those numbers reflect a sad fact of life: that rich people get more than poor people. As this fact is so obvious to

RE: GDP by language

2003-10-22 Thread Marco Cimarosti
Mark Davis wrote: BTW, some time ago I had generated a pie chart of world GDP divided up by language. Those quotients are immoral. Of course, this immorality is not the fault of he who did the calculation: the immorality is out there, and those infamous numbers are just an arithmetical

RE: Line Separator and Paragraph Separator

2003-10-21 Thread Marco Cimarosti
Jill Ramonsky wrote: [...] I've even invented (and used) some 8-bit encodings which leave the whole of Latin-1 unchanged (apart from the C1s) and use C1 characters a bit like surrogate pairs to reach the rest. Doug, are you listening? It seems there's a new clone of UTF:-)Z waiting for

RE: Swahili Banthu

2003-10-20 Thread Marco Cimarosti
Peter Kirk wrote: Are we talking about a real non-Latin script, some kind of syllabary or logographic script, for Swahili and other Bantu languages? [...] Or did someone not notice that Marco's comments were about the word joke? Indeed. In the last few months, I have been relatively

RE: Swahili Banthu

2003-10-20 Thread Marco Cimarosti
Philippe Verdy wrote: As Africa has been influenced by many foreign invasions, there may in fact exist other scripts to represent this language [...] Yes: until a recent past, Swahili was also commonly written in the Arabic alphabet. _ Marco

RE: PUA

2003-10-20 Thread Marco Cimarosti
Chris Jacobs wrote: [...] Nevertheless I think if Unicode don't want to decide how the PUA is to be interpreted Please take notice of this interpreted: I'll come back to this soon. it should be at the very least provide a mechanism by which an user of the PUA can specify which

RE: Klingons and their allies - Beyond 17 planes

2003-10-17 Thread Marco Cimarosti
John Cowan wrote: You persist in misunderstanding. Suppose I came along and told you I wanted to create a Unicode codepoint for each word in every language on Earth. Would you blithely allocate me a 24-billion-codepoint private space? Why? 200 millions should be more than enough: that's

RE: Canonical equivalence in rendering: mandatory or recommended?

2003-10-15 Thread Marco Cimarosti
Jill Ramonsky wrote: In my experience, there is a performance hit. I had to write an API for my employer last year to handle some aspects of Unicode. We normalised everything to NFD, not NFC (but that's easier, not harder). Nonetheless, all the string handling routines were not allowed to

RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

2003-10-09 Thread Marco Cimarosti
Gautam Sengupta wrote: --- Marco Cimarosti wrote: OK but, then, your ZWJ becomes exactly what Unicode's VIRAMA has always been: [...] You are absolutely right. I am suggesting that the language-specific viramas be retained as script-specific *explicit* viramas that never disappear

RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

2003-10-08 Thread Marco Cimarosti
Gautam Sengupta wrote: Is there any reason (apart from trying to be ISCII-conformant) why the Bangla word /ki/ what cannot be encoded as [KA][ZWJ][I]? Do we really need combining forms of vowels to encode Indian scripts? Perhaps you are right that it *would* have been a cleaner design to have

RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

2003-10-08 Thread Marco Cimarosti
Peter Kirk wrote: I don't understand the specific issues here... But it does seem a rather strange design principle that we should expect a text to be displayed meaningfully even when the font lacks the glyphs required for proper display. The fact is that these glyphs are not necessarily

RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

2003-10-08 Thread Marco Cimarosti
Gautam Sengupta wrote: I am no programmer, but surely the rendering engine could be tweaked to display a halant/hashant in the aforementioned situations? I understand that it won't happen *automatically* if we were to use ZWJ instead of VIRAMA. But if you were to take the trouble to do the

Bogus UTF's are back! :-) (was RE: Non-ascii string processing?)

2003-10-07 Thread Marco Cimarosti
Doug Ewell wrote: [...] we'd all use UTF-336. Er? If only I had a bit more spare time, Jill. You do NOT want to get me started... :-) Go for it, Doug! :-) If I only had a bit of spare time myself, I'd be eager of running bits-per-character statistics for UTF:-)336 in various

RE: Unicode Public Review Issues update

2003-10-07 Thread Marco Cimarosti
Jony Rosenne wrote: I don't remember whether Hebrew Braille is written RTL or LTR. Braille is always LTR, even for Hebrew and Arabic. To be more precise, Braille is always LTR when you read it, but RTL when you write it manually (because it is engraved on the back side of the paper, using a

Braille is not bidi neutral! (was RE: Unicode Public Review Issue s update)

2003-10-07 Thread Marco Cimarosti
I (Marco Cimarosti) wrote: Jony Rosenne wrote: I don't remember whether Hebrew Braille is written RTL or LTR. Braille is always LTR, even for Hebrew and Arabic. Hwæt! I noticed only now that the Bidirectional Category of braille characters is ON - Other neutrals! AFAIK, that is completely

RE: What things are called (was Non-ascii string processing)

2003-10-07 Thread Marco Cimarosti
Jill Ramonsky wrote: Hey - the public will just have to get used to it! No, the public should not be bored with these technical details: in the user manual, a book will still be a book. The fact that, in the source code of the application book means something else if of interest only to

RE: Non-ascii string processing?

2003-10-07 Thread Marco Cimarosti
Peter Kirk wrote: For i% = 1 to Len(utf8string$) c$ = Mid(utf8string$, i%, 1) Process c$ Next i% Such a loop would be more efficient in UTF-32 of course, but this is still a real need for working with character counts. If the string type and function of this Basic dialect is not

RE: Non-ascii string processing?

2003-10-07 Thread Marco Cimarosti
Elliotte Rusty Harold wrote: A W3C XML Schema Language validator needs a character based API to correctly implement the minLength and maxLength facets on xsd:string As far as I understand, xsd:string is a list of Character-s, and a Character is an integer which can hold any valid Unicode code

RE: Non-ascii string processing?

2003-10-06 Thread Marco Cimarosti
Doug Ewell wrote: Depends on what processing you are talking about. Just to cite the most obvious case, passing a non-ASCII, UTF-8 string to byte-oriented strlen() will fail dramatically. Why? The purpose of strlen() is counting the number of *bytes* needed to store a certain string, and this

RE: Non-ascii string processing?

2003-10-06 Thread Marco Cimarosti
Theodore H. Smith wrote: Hi lists, Hi, member. I'm wondering how people tend to do their non-ascii string processing. I think no one has been doing ASCII string processing for decades. :-) But I guess you meant non-SBCS (single byte character set) string processing. [...] So, I'm

RE: Non-ascii string processing?

2003-10-06 Thread Marco Cimarosti
Stephane Bortzmeyer wrote: On Mon, Oct 06, 2003 at 12:09:34PM +0200, Marco Cimarosti [EMAIL PROTECTED] wrote a message of 14 lines which said: What strlen() cannot do is countîng the number of *characters* in a string. But who cares? I can imagine very few situations where someone

RE: Non-ascii string processing?

2003-10-06 Thread Marco Cimarosti
Stephane Bortzmeyer wrote: OK. But the length in characters of a string is not character semantics: it's plain nonsense, IMHO. I disagree. Feel free. But I still don't see any use in knowing how many characters are in an UTF-8 string, apart the use that I already mentioned: allocating a

RE: Non-ascii string processing?

2003-10-06 Thread Marco Cimarosti
Edward H. Trager wrote: But I still don't see any use in knowing how many characters are in an UTF-8 string, apart the use that I already mentioned: allocating a buffer for a UTF-8 to UTF-32 conversion. Well, I know a good use for it: a console or terminal-based application which

RE: FW: Web Form: Other Question: British pound sign - U+00A3

2003-10-03 Thread Marco Cimarosti
This (Peter's) answer is, in my understanding, the nearest to the truth. He made the same assumption I did: you declared that your file was UTF-8 but actually it wasn't. :-) Here is the problem: How do I make my keyboard which only produces 8-bit [...] The keyboard has nothing to do with

RE: Web Form: Other Question: British pound sign - U+00A3

2003-10-01 Thread Marco Cimarosti
[EMAIL PROTECTED] wrote (through Magda Danish): [...] Our problem is the representation of the £ sign (British pound sign - U+00A3). When we type this character into our pages and then set the character encoding in our pages to Unicode (UTF-8) (either by setting it directly in the HTTP

RE: Internal Representation of Unicode

2003-09-26 Thread Marco Cimarosti
[EMAIL PROTECTED] wrote: In a plain text environment, there is often a need to encode more than just the plain character. A console, or terminal emulator, is such an environment. Therefore I propose the following as a technical report for internal encoding of unicode characters; with one

RE: About that alphabetician...

2003-09-25 Thread Marco Cimarosti
Michael Everson wrote: At 08:33 -0700 2003-09-25, John Hudson wrote: Unicode is an encoding standard for text on computers that allows documents in any script and language to be entered, stored, edited and exchanged. blank stare from layman Unicode is a code in which every letter of

RE: [OT?] QBCS

2003-08-29 Thread Marco Cimarosti
Doug Ewell wrote: [...] (BTW, pet peeve: The word acronym should only be used to mean a pronounceable WORD (nym) formed from the initials of other words. Classic examples are scuba and radar. If you can figure out how to pronounce qbcs, more power to you, but to me it's just an

[OT?] QBCS

2003-08-28 Thread Marco Cimarosti
It seems that the IT world has a new acronym: QBCS. I understand that it stands for quadra-byte character set, and I heard it used to refer to GB 13030. My question is: it just a fancy sinomym for GB 13030 or can it also refer to Unicode or other encodings? Thanks in advance. _ Marco

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-26 Thread Marco Cimarosti
+ comma + space + identifier bar). Regards. Marco Cimarosti ([EMAIL PROTECTED]) Feedback on UTR#31 (draft 1): Non-Latin Punctuation. I suggest that a small set of non-Latin punctuation marks be added in class Pattern_Syntax. Each one of the punctuation marks that I am suggesting to include

[OT?] ICU training offerings anyone?

2003-08-26 Thread Marco Cimarosti
Dear Unicoders, Does any company offer training on ICU programming? I am more interested in courses located in Europe, but I'd also be glad to know about courses in North America or elsewhere. If you feel that this information is not appropriate for the public list, please feel free to reply

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-25 Thread Marco Cimarosti
Peter Kirk wrote: Similarly, Hebrew geresh and gershayim look like quotation marks and are used interchangeably in legacy encodings, the same with maqaf and hyphen - maqaf is very much the cultural equivalent of hyphen, and I have seen recent discussion about whether the hyphen key on a

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-25 Thread Marco Cimarosti
Peter Kirk wrote: Well, the situation with Hebrew sof pasuq is almost identical to that for Greek and Arabic question marks, except that it is functionally a full stop not a question mark, so I can't see any reason other than prejudice for omitting it from the list. Well, I had a much

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-25 Thread Marco Cimarosti
Peter Kirk wrote: But the other way round is less of a problem. So I am suggesting that for now we define all punctuation characters except for those with specifically defined operator functions, also all undefined characters, as giving a syntax error. This makes it possible to define

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-22 Thread Marco Cimarosti
Rick McGowan wrote: the process as possible so that it can be considered The draft is found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as described there. (Before submitting official feedback, I'd like to discuss my comments here. BTW, which Type of Message should I

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-22 Thread Marco Cimarosti
Jill Ramonsky wrote: Damn. I guess you guys are all going to hate me for asking this, but ... what exactly is a mathematical space? An compatibility space character used only in typesetting mathematics: 205F;MEDIUM MATHEMATICAL SPACE;Zs;0;WS;compat 0020N; PS. I'm going to

RE: Proposed Draft UTR #31 - Syntax Characters

2003-08-22 Thread Marco Cimarosti
Mark Davis wrote: Technical Report issues would be fine. I think #1 is worth considering. For #2, see other message to Peter Kirk. I agree with your statement: The purpose of the Pattern Syntax characters is *not* to list everything that is a symbol or punctuation mark. But that is what

RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-20 Thread Marco Cimarosti
Peter Kirk wrote: [...] I guess English legs tended to be longer than Roman ones. Well, if by English you mean those Germanic barbarians who invaded Britannia, I guess that the British mile existed way before they set their feet on the island... _ Marco

RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Marco Cimarosti
Doug Ewell wrote: Shouldn't a pint of beer be administratively fixed at 500 mL, just as a fifth of liquor in America is now officially 750 mL? Seems like a good task for an ISO working group. You could generalize it a bit: Alignment Of Metric And Imperial Units Whose Difference Is So Small

RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Marco Cimarosti
Pim Blokland wrote: It must be a really urgent need if one cares about those 3.28 metres... 4.28 actually. Ooops. But are you serious about lengthening the yard to be the same size as the meter? I was just joking... Ha! Fat chance! You might as well suggest we abolish the yard

RE: Handwritten EURO sign (off topic?)

2003-08-14 Thread Marco Cimarosti
Anto'nio Martins-Tuva'lkin wrote: On 2003.08.06, 11:12, Philippe Verdy [EMAIL PROTECTED] wrote: the placement of the currency unit symbol or multiple is language dependant, and the same local practices are used with the euro, as the one used for pre-euro currencies. You mean that

RE: Arabic script web site hosting solution for all platforms

2003-06-18 Thread Marco Cimarosti
Philippe Verdy wrote: Excessive cross-posting to multiple newsgroups, forums and list servers is considered bulk (and also opposed to the netiquette). As this message is targetting a too large audience and out of topic, and is also a commercial ad, I can say that bulk+unsollicitated makes it

RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK

2003-06-06 Thread Marco Cimarosti
Rob Mount Q1: Can a character be both alphabetic and diacritic? I would say yes. My understanding of the Lm general category is: a diacritic letter. Q2: Is there a difinitive answer as to whether this is an alphabetic character? Strictly speaking, as katakana and hiragana are not alphabets,

RE: Tamazight/berber language : How to send mail, write word documents ....

2003-06-06 Thread Marco Cimarosti
Chris Jacobs wrote: Depends on how much text you need. If it is just a few words then getting an unipad from http://www.unipad.org/ would be enough. You can copy and paste the chars from it. If this is not enought than have a look at http://www.tavultesoft.com/keyman/ BTW, Unipad also

RE: Tamazight/berber language : How to send mail, write word documents ....

2003-06-06 Thread Marco Cimarosti
Philippe Verdy wrote: However the interesting part of your question for discussion in this list is: - Which Unicode character should be used to encode the spacing ring? (may conflict with the degree sign, or a upscript small letter O) - Should you use a Greek Gamma or a Latin Gamma, and a

PUA usage (was RE: Announcement: New Unicode Savvy Logo)

2003-05-31 Thread Marco Cimarosti
[OOOPS! This works better if I set the proper MIME encoding... Sorry] Philippe Verdy wrote: This contrasts a lot with the Unicode codepoints assigned to abstract characters, that are processable out of any contextual stylesheet, font or markup system, where its only semantic is in that

PUA usage (was RE: Announcement: New Unicode Savvy Logo)

2003-05-31 Thread Marco Cimarosti
Philippe Verdy wrote: This contrasts a lot with the Unicode codepoints assigned to abstract characters, that are processable out of any contextual stylesheet, font or markup system, where its only semantic is in that case private use with no linguistic semantic and no abstract character

RE: The role of country codes/Not snazzy

2003-05-30 Thread Marco Cimarosti
Brian Doyle wrote: on 5/29/03 9:15 AM, Marion Gunn at [EMAIL PROTECTED] wrote: When a reference to using embryonic ISO 639-3 to 'legitimize' SIL's flawed Ethnologue is let pass with no comment Why is Ethnologue flawed? And how is this more on-topic on a mailing list called Unicode

RE: Not snazzy (was: New Unicode Savvy Logo)

2003-05-29 Thread Marco Cimarosti
Philippe Verdy wrote: Savvy is better understood in this context as aware, than archaic or informal in your English-Italian dictionnary. No, archaic, American and informal are usage labels, not translations. The translation is buon senso. (BTW, it is: Dizionario Garzanti di inglese, Garzanti

RE: Not snazzy (was: New Unicode Savvy Logo)

2003-05-29 Thread Marco Cimarosti
Rick McGowan wrote: 2. It is unikely that the Unicode *logo* itself (i.e. the thing at http://www.unicode.org/webscripts/logo60s2.gif) will be incorporated directly in any image that people are allowed to put on their websites, because to put the Unicode logo on a product or whatever

RE: Not snazzy (was: New Unicode Savvy Logo)

2003-05-28 Thread Marco Cimarosti
Andrew C. West wrote: I agree with Philippe on this one. A sensible, and easily understandable, motto like The world speaks Unicode would be much better. The word savvy just sends a shiver of embarrasment down my spine. Not only is savvy not a word that is probably high in the vocabulary

RE: Exciting new software release!

2003-04-01 Thread Marco Cimarosti
Doug Ewell wrote: Drop everything and check out a kewl new Windows program available at: http://users.adelphia.net/~dewell/mathtext.html 𝔬𝔱𝔣𝔩! _ Marco

RE: Several BOMs in the same file

2003-03-25 Thread Marco Cimarosti
Stefan Persson wrote: Let's say that I have two files, namely file1 file2, in any Unicode encoding, both starting with a BOM, and I compile them into one by using cat file1 file2 file3 in Unix or copy file1 + file2 file3 in MS-DOS, file3 will have the following contents: BOM

RE: Several BOMs in the same file

2003-03-25 Thread Marco Cimarosti
I (Marco Cimarosti) wrote: As a minimum, option -v must know the semantics of NL and LF control codes, of the digits, and the of white space. Sorry, I meant: option -n. _ Marco

RE: Several BOMs in the same file

2003-03-25 Thread Marco Cimarosti
Kent Karlsson wrote: I'm not going into the implementation part; just pointing out that this issue is not something an operating system can ignore. cat and cp can and shall ignore it. They are octet-level file operations, attaching no semantics to the octets. Try iconv. This byte-level

RE: List of ligatures for languages of the Indian subcontinent.

2003-03-18 Thread Marco Cimarosti
Kenneth Whistler wrote: Dream on. The information needed exists in books and other reference source in libraries, book shops, and other collections across India -- and, for that matter, around the world. It is merely a matter of collecting the relevant information and distilling it into

RE: Need encoding conversion routines

2003-03-14 Thread Marco Cimarosti
askq1 askq1 wrote: From: Pim Blokland [EMAIL PROTECTED] However, you have said this is not what you want! So what is it that you do want? I want c/c++ code that will give me UTF8 byte sequence representing a given code-point, UTF16 16 bits sequence reppresenting a given code-point,

RE: Need encoding conversion routines

2003-03-12 Thread Marco Cimarosti
askq1 askq1 wrote: I want c/c++ functions/routines that will convert Unicode to UTF8/UTF16/UCS2 encodings and vice-versa. Can some-one point me where can I get these code routines? Unicode's reference implementation is here, but I don't know how much up-to-date it is with some tiny changes in

RE: Encoding: Unicode Quarterly Newsletter

2003-03-11 Thread Marco Cimarosti
Kenneth Whistler wrote: [...] Of course, further weight corrections need to be applied if reading the standard *below* sea level or in a deep cave. I hope it will not be consider pedantic to observe that the mass or weight of a book do not change depending on whether someone is reading it or

  1   2   3   4   5   6   7   8   >