RE: Suggestions in Unicode Indic FAQ

2003-02-02 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > 
> > No fallback rendering is coming into picture with your explanation. 
> 
> Yes, there is.  A character sequence  (say)
> is very unlikely to have a ligature, specially adapted (and fitting)
> adjustment points, or similar.  The rendering would in that sense
> need to use a fallback mechanism that renders an "approximation"
> for this rare combination.

Do you mean to say that an application has to take care of combination of
all other Unicode characters with each combining marks in the fallback
mechanism for such approximation? Can you count the number of combinations
which may result in millions!?

- Keyur


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com




RE: Suggestions in Unicode Indic FAQ

2003-02-02 Thread Keyur Shroff

--- Kent Karlsson <[EMAIL PROTECTED]> wrote:

> > 
> > Without that dotted circle appearing, the e-matra would appear to
> > have been properly encoded, 
> 
> No, with proper reordering (and "normal" display mode), the e-matra at
> the beginning of the second word would appear to be last glyph of the
> first "word".  Similarly, for the second case, the e-matra glyph would
> have come to the left of the pa.  The fluent reader (ok, not me...)
> would then see those errors anyway, just like I can find spelling
> errors in Swedish, most often without any kind of special marking. (I'm
> assuming through-out that reordrant combining characters are reordered.)

Illegal sequences are not reordered as you indicated. Also, as far as I
know there is no mention of reordering of illegal input sequence (or
invalid combining mark) in Unicode standard.

Consider the last set of glyphs (left-to-right, top-to-bottom) in the
attached image. It is the rendering effect of illegal input sequence
"Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka" [U+0915] and
without any dotted circle. As you might be knowing the correct input
sequence should be U+0915 followed by U+093F. In that case the result would
have been similar to what appears right now. (Though some more
sophisticated font/application may want to replace the appearing glyph for
U+093F to be substituted by some other glyph with proper attachment point).
Now there is no way that user can identify this illegal input sequence
without dotted circle. In the worst case even this rendered glyph is
attached to the character from a class (for example, consonant cluster of
"Ka" "Virama" "Ma") for which the glyph has been designed to render with.
In such case even a fluent reader can not identify the error.

> 
> There are spelling errors, yes.  But there are other ways of indicating
> spelling errors, that are (by now) fairly conventional for any language
> (as long as there is an appropriate dictionary installed), and that also
> are more general (in catching more spelling errors) and less obtrusive
> (the author really wants to write it that way, for some reason).
> 
> > Apparently, Michka used a non-OpenType Bengali Unicode font when
> > he embedded the fonts into the page.  As long as you are looking
> > at the page on-line, with the embedded fonts, these errors are
> > invisible.  
> > 
> > It may be typographically horrible.  It *should* be typographically
> > horrible in order to illustrate bad sequences clearly.
> 
> I'd prefer little red wiggly lines under the word, or yellow background
> or some such (just for screen display, not for printing; screen grabs
> not counted).  And that for any spelling "error".

Spelling mistakes can be categorized into two different classes. One
arising from illegal input sequence (e.g., Vowel Sign E as the first
character in a word) and the other one is legal input sequence with no
contextual meaning in the dictionary. While indication of the second type
of mistake is generally used only in sophisticated applications like word
processor, everyone wants to know the first kind of mistake. With your
explanation it seems that even plain text editor is not useful at all to
identify such common typing mistakes!

- Keyur


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
<>

RE: 4701

2003-02-02 Thread Greenwood, Timothy









And
the Boston Globe has it as the year of the ghost





Stolen from Mathews, as it happens. 





 



On Google, "year of the goat"
has the 





lead. 





 










Re: LATIN LETTER N WITH DIAERESIS?

2003-02-02 Thread Lukas Pietsch
> All characters are now mapped to Unicoe characters or character
sequences
> where I felt that this was possible. If there are obvioous errors,
please
> point them out and I'll update the listing.
>
> However, there are some unidentified characters, or ones that could be
> considered missing from Unicode  4.0, or which have mappings that for
one
> or the other reason could be considered not ideal. These have been
> highlighted. I welcome suggestions for additions to or subtractions
from
> this list, plus any help anyone could provide in identifying the
characters
> or in locating places they are used.

Your F725 Unknown-2, to me, looks like a German SCRIPT CAPITAL S,
(compare with U+2112;SCRIPT CAPITAL L). Yes, we were taught to write an
S like this in school. Perhaps it's used somewhere in mathematics?

Your F7AA Unknown-8 could then be a SCRIPT CAPITAL C.

Your F747, spacing left hook below - doesn't it look very much like the
palatalization hooks used elsewhere in the list (which you mapped to
U+0321)?

Your combinations "with latin small letter dotless i" (e.g. F704, F731,
F77A) seem to be designed for use in phonetic transcriptions, and hence
are probably intended as IPA U+026A;LATIN LETTER SMALL CAPITAL I

F737: the description in your list doesn't match the glyph shown, which
is "with triangular colon".

F70F "Latin small letter a with colon" shows a triangular colon glyph
and should hence be mapped to U+02D0, not U+003A.

F70E "Latin small letter a with tilde with modifier letter triangular
colon" shows a U+0251 "Latin small letter alpha" glyph.

F750 "Latin small letter i with palatalized hook below" shows an
inverted breve glyph, not a hook.

F751 "Latin small letter i with tilde with tilde" shows a macron and a
tilde

F754 and F755 "Latin small letter J with..." show i, not j glyphs.

F79B "Latin small letter S with retroflex hook below" shows not a
retroflex hook, but something more like an ogonek. A retroflex hook
should be attached to the left side of the S, not in the middle below,
and has its own precomposed IPA codepoint U+0282.

F7AC "Latin small letter u with dot below with diaeresis" shows an
acute, not a diaeresis.

Lukas






Re: LATIN LETTER N WITH DIAERESIS?

2003-02-02 Thread jcowan
Asmus Freytag scripsit:

> However, there are some unidentified characters, or ones that could be 
> considered missing from Unicode  4.0, or which have mappings that for one 
> or the other reason could be considered not ideal. These have been 
> highlighted. I welcome suggestions for additions to or subtractions from 
> this list, plus any help anyone could provide in identifying the characters 
> or in locating places they are used.

I strongly suspect that your various DIGRAPHS WITH BREVE BELOW are
actually underties.  In addition, U+F7A1 looks like a glyph variant
of the glyph often used in American dictionaries to represent edh,
though I have more often seen it with the stroke passing through both
legs of the "h" portion.  U+F776 and U+F777 are probably also American
dictionary characters representing the so-called "short" and "long"
sounds of English "oo", though I have more often seen them without ligaturing.




Re: OT: Haikus for Unicode-Haters

2003-02-02 Thread Shlomi Tal
You're right, but neither Monogolian nor Indic fits the 5-7-5 syllable 
constraint of haiku. Ben-ga-li-Sha-ping maybe? :-)

But anyway, as I've been reading on Thomas Milo's (Decotype) paper on Arabic 
recently refered to here, Arabic typography isn't so simple once you get out 
of the simplified printing-Arabic paradigm.

I have been using Arabic on computers since 1993, on Accent Software's word 
processor Dagesh (a multiscript word processor for Windows 3.x). The shaping 
mechanism for Arabic hasn't changed since. And I read this implementation 
goes back to the Apple Mac Arabic word processor "Al-Kaatib Ad-Dawli", in 
the late 1980s.

ST

_
Tired of spam? Get advanced junk mail protection with MSN 8. 
http://join.msn.com/?page=features/junkmail




Re: OT: Haikus for Unicode-Haters

2003-02-02 Thread Roozbeh Pournader
On Sun, 2 Feb 2003, Shlomi Tal wrote:

> Arabic shaping
> Difficult to implement
> It's a complex script.

I can't understand how Arabic suddenly appears in your list. The
complexities are in the script itself, and not in Unicode. I have yet to
see any sound standard for Arabic information interchange that doesn't use 
the same model Unicode uses for Arabic. ISO 8859-6 does, CP1256 does, and 
ISIRI 3342 does. Even the weird UZT standard almost uses the same model.

If only you wanted some complex script, use something more complex next 
time. Mongolian, for example...

roozbeh





OT: Haikus for Unicode-Haters

2003-02-02 Thread Shlomi Tal
Unicode is shit!
What a dreadful encoding.
Who thought up this crap?

UTF-16
Has those pesky surrogates
Very bad design.

Arabic shaping
Difficult to implement
It's a complex script.

One should circumvent
Endian related issues.
UTF-8 does.

_
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail