Re: Are these characters encoded?
On Sun, 2 Dec 2001 [EMAIL PROTECTED] wrote: [...] (cf. GREEK QUESTION MARK). [...] This would be like using U+003B at the end of a Greek question. Sorry, but U+037E GREEK QUESTION MARK is cannonically equivalent to U+003B SEMICOLON. I guess it is there only because ISO 8859-7 wanted to disunify them. -- Note: If you want me to read a message, please make sure you include my address in To or CC fields. I may not be able to follow all the discussions on the mailing lists I subscribe. Sorry. (No, there's no problem to receive duplicates.) --roozbeh
RE: Are these characters encoded?
Asmus Freytag wrote: Overloading the existing 00BA º is tempting, but would likely result in incorrect output unless special purpose (read private use) fonts are used, or unless it became common to have a Swedish glyph overrides in fonts and rendering engines that applied them. Since the usage and typographic convention for 'och' and the raised o for numbering are not related, this unification smells more of shoehorning than encoding. Perhaps there is also a logical difference. The Swedish o represents the *first* letter of a word (och), and can thus be interpreted as o. (o *followed* by a dot); 00BA represents the *last* letter of a word (it abbreviates ordinal adjectives like primero, segundo, tercero... primo, secondo, terzo...), so it may logically be interpreted as .o (o *preceded* by a dot). _ Marco
Re: Indic editing (was: RE: The real solution)
Hi Everybody The statement by Mr. John Hudson that the system of the fact that phonetic keyboarding, while the norm for the Indian publishing and typesetting industries, was not the norm for typewriters is not entirely correct. It was not the norm earlier but is the current norm for many years now. Moreover, the concept of la = half la + danda may be natural for people who are used to typewriters and typography. Which is, some of the people who are more likely to switch to computers. I fully agree with Mr. Marco Cimarosti in this regard. This is the point to which i really wanted everybody to focus on i.e. the problem of encoding as well as display . Yes, there are many easy solutions. The fact is that this are worth nothing until Unicode officially adopts one of them. This is the ultimate truth and this was the main point with which i initiated this dicussion . With Regards Arjun Aggarwal [EMAIL PROTECTED]
Re: Indic editing (was: RE: The real solution)
From: Arjun Aggarwal [EMAIL PROTECTED] Moreover, the concept of la = half la + danda may be natural for people who are used to typewriters and typography. Which is, some of the people who are more likely to switch to computers. I fully agree with Mr. Marco Cimarosti in this regard. This is the point to which i really wanted everybody to focus on i.e. the problem of encoding as well as display . Well, you do need to understand that you could actually create input methods that would allow people who wish to type this way to do so -- and the underlyhing data could still be stored using the current encoding. The needs of those who wish to keep their keyboards can be met without trying to undo all the implementations that have been done. -- MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
RE: Indic editing (was: RE: The real solution)
Arjun Aggarwal wrote: Moreover, the concept of la = half la + danda may be natural for people who are used to typewriters and typography. Which is, some of the people who are more likely to switch to computers. I fully agree with Mr. Marco Cimarosti in this regard. This is the point to which i really wanted everybody to focus on i.e. the problem of encoding as well as display . Therefore, you don't fully agree with me. My opinion is that the encoding is OK as it is in ISCII and Unicode. I take in consideration your way of splitting the graphemes *only* at the editing level. _ Marco
RE: Are these characters encoded?
Summary answer to the question in the subject line: yes. As I tried to express as succinctly as possible before is that:1) ando̲(underlined o, sometimes used as an abbreviation for 'och', as is 'o.' (dictionaries)and 'o', and even 'å') is definitely not a glyph variant issue, they are not interchangeable,even though the meaning is the same. Asmus gave an example. Further one can use without spaces around it (since the ligature is so highly ligated), but for o̲ there shouldalways be spaces around it. B.t.w. is called et-tecken in Swedish. Getting et-teckenrendered aso̲ (underlined o) would be surprising indeed.2) o̲ (underlined o; it even displays fair, but not good, in the font I'm using right now) isalready perfectly well available in Unicode. There no need to encode it again. Raising ita little bit (not much)over the baseline (that some do in handwriting) would be fine tuningthat is not appropriate for a character encoding, but might be for a handwriting imitatingfont, or for typographic fine tuning markup. 3) The following ones are all inappropriate:00B0;DEGREE SIGN;So;0;ET;N;00BA;MASCULINE ORDINAL INDICATOR;Ll;0;L;super 006FN;2070;SUPERSCRIPT ZERO;No;0;EN;super 0030;0;0;0;N;SUPERSCRIPT DIGIT ZERO the first and last are obviously(?) wrong. Why not 00BA? There are two reasons: the glyphfor 00BA is not always underlined (even though a plain o can be used for 'och' in sloppyhandwriting or (rare) "spell as you speak" texts), andthe glyph for 00BAis (always) raisedtoo much for the o̲ (underlined o for 'och') usage. (But, but for "numero", which is also usedhere, I would use Nº (004E, 00BA) rather than № (2116) or No̲ (004E, 006F, 0332.) Kind regards /kent k
RE: Indic editing (was: RE: The real solution)
O, by the way, I forgot this... Arjun Aggarwal wrote: Yes, there are many easy solutions. The fact is that this are worth nothing until Unicode officially adopts one of them. This is the ultimate truth and this was the main point with which i initiated this dicussion . Almost every sentence may become the ultimate truth, if you remove enough context to make it meaningless. I can say a lot of tupid things on my own, and I don't need anybody's help to put more stupid things in my mouth. Thanks. My sentence above referred to a very specific problem: finding a way of mapping the ISCII sequence RA + HALANT + INV to Unicode. Here is the sentence in its original context: Marco Cimarosti wrote: Dhrubajyoti Banerjee wrote: [...] Marco Cimarosti wrote: [...] I am talking again about REPHA IN ISOLATION: ISCII has a way of representing it, but Unicode does not. This is needed, even only for encoding didactic texts, and a solution to encode it (with ZWJ, probably) should be found. I think the same way it is done in ISCII would be quite okay. In ISCII you get it by typing the INV character after ra virama. A similiar solution may be provided for, in Unicode, by using ZW(N)J. Yes, there are many easy solutions. The fact is that this are worth nothing until Unicode officially adopts one of them. _ Marco
Re: Are these characters encoded?
- Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: den 3 december 2001 02:35 Subject: Re: Are these characters encoded? Perhaps they should be. Er... So 3 and 三 are the same character...? I wonder: When transcribing a foreign name (like a business name) that includes the ampersand, would a Swede use the och sign? Sometimes yes, sometimes no. In other words, does there exist a case where the ampersand and the och sign are not interchangeable? No. At least not if the text is in Swedish. Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Suggestions for next print edition
You can always search the big Unihan.txt file on the kJapaneseKun and kJapaneseOn fields, which provide whatever information we have on pronunciation of the characters in Japanese. If you are just stuck looking up stuff because it isn't marked up for Japanese, try getting Sanseido's Unicode Kanji Information Dictionary, which has the first 20,902 kanji in Unicode (the most useful set) all marked up with all the Japanese pronunciations (where they have any). The first suggestion is useless. The file is too freaking big so maybe I'll go with the second. Thanks. -- ___ Get your free email from http://www.ranmamail.com Powered by Outblaze
Re: Are these characters encoded?
When I've seen the c-underbar in print, it has always meant circa, as in circa 1800. Jim At 10:14 PM 2001-12-01 + Saturday, Michael Everson wrote: (As a side note, this o-underbar form reminds me of the c-underbar which is sometimes used in handwritten English to mean with. Does anyone know the origin of this symbol? Is it possibly derived from the Latin word cum, meaning with? Does it have any claim to being a character in its own right?) Perhaps a corruption of c-overbar, which is a medical abbreviaton for with, sometimes used by nurses, doctors, and pharmacies?
FW: Question about some MS IE options
Title: Message -Original Message-From: Robert M. Gerlach [mailto:[EMAIL PROTECTED]] Sent: Monday, December 03, 2001 3:24 PMTo: [EMAIL PROTECTED]Subject: Question Hi, When saving a webpage from within Microsoft Internet Explorer, there are a few notable options... and I'm really unsure as to what the differences are, which is "better," etc. I know you're not Microsoft or technical support fTM, but I'm betting that you guys would know better thantheywould[...] Here they are: Unicode Unicode (UTF-8) Western European (ISO) Western European (Windows) Thanks a million! -Rob :)
Unicode/Customizable Typing Tutors Apps?
I'm just curious if anyone out there has come across a typing tutor app (web based or installed) that is customizable and Unicode savvy? It doesn't have to be very complex so long as it can handle different Unicode scripts. Thanks, -Gavin
Unicode 1.0 names for control characters
I am surprised and puzzled by the Unicode 1.0 Name changes for some of the ASCII and Latin-1 control characters that were introduced in the latest beta version of the Unicode 3.2 data file (UnicodeData-3.2.0d5.txt): U+0009 HORIZONTAL TABULATION == CHARACTER TABULATION U+000B VERTICAL TABULATION == LINE TABULATION U+001C FILE SEPARATOR == INFORMATION SEPARATOR FOUR U+001D GROUP SEPARATOR == INFORMATION SEPARATOR THREE U+001E RECORD SEPARATOR == INFORMATION SEPARATOR TWO U+001F UNIT SEPARATOR == INFORMATION SEPARATOR ONE U+008B PARTIAL LINE DOWN == PARTIAL LINE FORWARD U+008C PARTIAL LINE UP == PARTIAL LINE BACKWARD Were these new names (e.g. CHARACTER TABULATION) really the original Unicode 1.0 names? I don't have my 1.0 book close at hand, but I know that they were *not* the names used in 1.1, according to the file namesall.lst from that version. (Aha, didn't think anyone still had that dusty old thing lying around?) IMHO, the new names CHARACTER TABULATION and LINE TABULATION are much less intuitive than HORIZONTAL TABULATION and VERTICAL TABULATION. Sometimes you even see the abbrevations HT and VT for these two characters. The new names appear to have been invented by someone who imagined a lack of clarity in the old names. I have seen the names IS4, IS3, IS2, and IS1 before, but they do not convey the same information as FS, GS, RS, and US. The latter names are more specific. The old names for these six control characters were used as far back as the original 1963 version of ASCII, according to Mackenzie (pp. 245-247). I don't know about the history of U+008B and U+008C, but again it seems strange that the Unicode 1.0 name for these characters is being changed at this late date. I know this 1.0 name field is not subject to the same rule of no changes, ever that applies to the regular Character Name field, but why should these names be changed at all? On this same topic, parenthesized abbreviations have been added to the 1.0 names for U+000A LIFE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), and U+0085 NEXT LINE (NEL). Does the addition of these abbreviations mean that they are now part of the official 1.0 name, and if so, why? Other characters typically don't have abbreviations as part of their names, even if they are as meaningful and as commonly used as these, and again it is a change from the 1.0 name we have seen for a decade. Perhaps I've been checking the beta files a bit TOO carefully. -Doug Ewell Fullerton, California
Re: Are these characters encoded?
In a message dated 2001-12-03 12:20:46 Pacific Standard Time, [EMAIL PROTECTED] writes: Perhaps a corruption of c-overbar, which is a medical abbreviaton for with, sometimes used by nurses, doctors, and pharmacies? Thanks to everyone who, directly or indirectly, corrected me on this character. Yes, you are all right: the character used in (as it turns out) the medical field to mean with is, in fact, c-overbar and not c-underbar. In Unicode we would say U+0063 U+0305. So to get back to my original questions about this thing, (a) is it a character in its own right, (b) if so, is there any justification in encoding it separately rather than using a combining sequence, and (c) is this not *exactly* the same set of issues as the question of encoding the Swedish o-underbar? -Doug Ewell Fullerton, California