RE: Suggestions in Unicode Indic FAQ
--- Kent Karlsson <[EMAIL PROTECTED]> wrote: > > > > No fallback rendering is coming into picture with your explanation. > > Yes, there is. A character sequence (say) > is very unlikely to have a ligature, specially adapted (and fitting) > adjustment points, or similar. The rendering would in that sense > need to use a fallback mechanism that renders an "approximation" > for this rare combination. Do you mean to say that an application has to take care of combination of all other Unicode characters with each combining marks in the fallback mechanism for such approximation? Can you count the number of combinations which may result in millions!? - Keyur __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
RE: Suggestions in Unicode Indic FAQ
--- Kent Karlsson <[EMAIL PROTECTED]> wrote: > > > > Without that dotted circle appearing, the e-matra would appear to > > have been properly encoded, > > No, with proper reordering (and "normal" display mode), the e-matra at > the beginning of the second word would appear to be last glyph of the > first "word". Similarly, for the second case, the e-matra glyph would > have come to the left of the pa. The fluent reader (ok, not me...) > would then see those errors anyway, just like I can find spelling > errors in Swedish, most often without any kind of special marking. (I'm > assuming through-out that reordrant combining characters are reordered.) Illegal sequences are not reordered as you indicated. Also, as far as I know there is no mention of reordering of illegal input sequence (or invalid combining mark) in Unicode standard. Consider the last set of glyphs (left-to-right, top-to-bottom) in the attached image. It is the rendering effect of illegal input sequence "Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka" [U+0915] and without any dotted circle. As you might be knowing the correct input sequence should be U+0915 followed by U+093F. In that case the result would have been similar to what appears right now. (Though some more sophisticated font/application may want to replace the appearing glyph for U+093F to be substituted by some other glyph with proper attachment point). Now there is no way that user can identify this illegal input sequence without dotted circle. In the worst case even this rendered glyph is attached to the character from a class (for example, consonant cluster of "Ka" "Virama" "Ma") for which the glyph has been designed to render with. In such case even a fluent reader can not identify the error. > > There are spelling errors, yes. But there are other ways of indicating > spelling errors, that are (by now) fairly conventional for any language > (as long as there is an appropriate dictionary installed), and that also > are more general (in catching more spelling errors) and less obtrusive > (the author really wants to write it that way, for some reason). > > > Apparently, Michka used a non-OpenType Bengali Unicode font when > > he embedded the fonts into the page. As long as you are looking > > at the page on-line, with the embedded fonts, these errors are > > invisible. > > > > It may be typographically horrible. It *should* be typographically > > horrible in order to illustrate bad sequences clearly. > > I'd prefer little red wiggly lines under the word, or yellow background > or some such (just for screen display, not for printing; screen grabs > not counted). And that for any spelling "error". Spelling mistakes can be categorized into two different classes. One arising from illegal input sequence (e.g., Vowel Sign E as the first character in a word) and the other one is legal input sequence with no contextual meaning in the dictionary. While indication of the second type of mistake is generally used only in sophisticated applications like word processor, everyone wants to know the first kind of mistake. With your explanation it seems that even plain text editor is not useful at all to identify such common typing mistakes! - Keyur __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com <>
RE: 4701
And the Boston Globe has it as the year of the ghost Stolen from Mathews, as it happens. On Google, "year of the goat" has the lead.
Re: LATIN LETTER N WITH DIAERESIS?
> All characters are now mapped to Unicoe characters or character sequences > where I felt that this was possible. If there are obvioous errors, please > point them out and I'll update the listing. > > However, there are some unidentified characters, or ones that could be > considered missing from Unicode 4.0, or which have mappings that for one > or the other reason could be considered not ideal. These have been > highlighted. I welcome suggestions for additions to or subtractions from > this list, plus any help anyone could provide in identifying the characters > or in locating places they are used. Your F725 Unknown-2, to me, looks like a German SCRIPT CAPITAL S, (compare with U+2112;SCRIPT CAPITAL L). Yes, we were taught to write an S like this in school. Perhaps it's used somewhere in mathematics? Your F7AA Unknown-8 could then be a SCRIPT CAPITAL C. Your F747, spacing left hook below - doesn't it look very much like the palatalization hooks used elsewhere in the list (which you mapped to U+0321)? Your combinations "with latin small letter dotless i" (e.g. F704, F731, F77A) seem to be designed for use in phonetic transcriptions, and hence are probably intended as IPA U+026A;LATIN LETTER SMALL CAPITAL I F737: the description in your list doesn't match the glyph shown, which is "with triangular colon". F70F "Latin small letter a with colon" shows a triangular colon glyph and should hence be mapped to U+02D0, not U+003A. F70E "Latin small letter a with tilde with modifier letter triangular colon" shows a U+0251 "Latin small letter alpha" glyph. F750 "Latin small letter i with palatalized hook below" shows an inverted breve glyph, not a hook. F751 "Latin small letter i with tilde with tilde" shows a macron and a tilde F754 and F755 "Latin small letter J with..." show i, not j glyphs. F79B "Latin small letter S with retroflex hook below" shows not a retroflex hook, but something more like an ogonek. A retroflex hook should be attached to the left side of the S, not in the middle below, and has its own precomposed IPA codepoint U+0282. F7AC "Latin small letter u with dot below with diaeresis" shows an acute, not a diaeresis. Lukas
Re: LATIN LETTER N WITH DIAERESIS?
Asmus Freytag scripsit: > However, there are some unidentified characters, or ones that could be > considered missing from Unicode 4.0, or which have mappings that for one > or the other reason could be considered not ideal. These have been > highlighted. I welcome suggestions for additions to or subtractions from > this list, plus any help anyone could provide in identifying the characters > or in locating places they are used. I strongly suspect that your various DIGRAPHS WITH BREVE BELOW are actually underties. In addition, U+F7A1 looks like a glyph variant of the glyph often used in American dictionaries to represent edh, though I have more often seen it with the stroke passing through both legs of the "h" portion. U+F776 and U+F777 are probably also American dictionary characters representing the so-called "short" and "long" sounds of English "oo", though I have more often seen them without ligaturing.
Re: OT: Haikus for Unicode-Haters
You're right, but neither Monogolian nor Indic fits the 5-7-5 syllable constraint of haiku. Ben-ga-li-Sha-ping maybe? :-) But anyway, as I've been reading on Thomas Milo's (Decotype) paper on Arabic recently refered to here, Arabic typography isn't so simple once you get out of the simplified printing-Arabic paradigm. I have been using Arabic on computers since 1993, on Accent Software's word processor Dagesh (a multiscript word processor for Windows 3.x). The shaping mechanism for Arabic hasn't changed since. And I read this implementation goes back to the Apple Mac Arabic word processor "Al-Kaatib Ad-Dawli", in the late 1980s. ST _ Tired of spam? Get advanced junk mail protection with MSN 8. http://join.msn.com/?page=features/junkmail
Re: OT: Haikus for Unicode-Haters
On Sun, 2 Feb 2003, Shlomi Tal wrote: > Arabic shaping > Difficult to implement > It's a complex script. I can't understand how Arabic suddenly appears in your list. The complexities are in the script itself, and not in Unicode. I have yet to see any sound standard for Arabic information interchange that doesn't use the same model Unicode uses for Arabic. ISO 8859-6 does, CP1256 does, and ISIRI 3342 does. Even the weird UZT standard almost uses the same model. If only you wanted some complex script, use something more complex next time. Mongolian, for example... roozbeh
OT: Haikus for Unicode-Haters
Unicode is shit! What a dreadful encoding. Who thought up this crap? UTF-16 Has those pesky surrogates Very bad design. Arabic shaping Difficult to implement It's a complex script. One should circumvent Endian related issues. UTF-8 does. _ STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail