Re: Arabic Presentation Forms
At 09:56 PM 1/30/2003, Mete Kural wrote: I need to figure out a method to convert Arabic Unicode text encoded in its normal form to Arabic Unicode text encoded in Arabic presentation forms. May I ask why you want to do this? John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] A book is a visitor whose visits may be rare, or frequent, or so continual that it haunts you like your shadow and becomes a part of you. - al-Jahiz, The Book of Animals
Re: Arabic Presentation Forms
On 31/01/2003 05:56:55 Mete Kural wrote: I need to figure out a method to convert Arabic Unicode text encoded in its normal form to Arabic Unicode text encoded in Arabic presentation forms. Are you aware that the presentation forms are incomplete? That is, there are Arabic letters in the U+06xx block for which there isn't a complete set of presentation forms defined in the Arabic presentation forms areas. You'll need to decide what to do with such... Bob
Re: Arabic Presentation Forms
Do you any suggestions on how I could convert a piece of Unicode text in this manner? Are there any programs that could do this? Roman Czyborra's arabjoin (a Perl script): http://czyborra.com/arabjoin/ It does the conversion to Arabic Presentation Forms. But also, which may not be what you need, it converts logically-ordered Arabic to visual order; this for display on systems that support neither BiDi nor Arabic shaping. ST _ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus
RE: Suggestions in Unicode Indic FAQ
Keyur Shroff wrote: ... No fallback rendering is coming into picture with your explanation. Yes, there is. A character sequence FULL STOP, VOWEL SIGN E (say) is very unlikely to have a ligature, specially adapted (and fitting) adjustment points, or similar. The rendering would in that sense need to use a fallback mechanism that renders an approximation for this rare combination. ... Here is the para you are talking about. [Quote] [...] should be rendered as if they had a space as a base character. [/Quote] In the text there is no mention of explicitly inputting space character before any combining mark that is defective combining character. The text says as if. Which I also emphasised before. Also, the word should be rendered implies that it is recommendation. Yes. A rather good one. By removing that particular fallback mechanism from implementations [inserting dotted circle glyphs for allegedly invalid combinations] as well as the TUS text! (I'm serious!) This particular fallback mechanism is NOT recommended as it stands. Note that the text has been written in the section Implementation Guidelines. Can't it be considered as recommendation? That particular one, no. Just an example [that isn't very good, outside of a general show invisibles mode]. But since its mention is erroneously taken as a recommendation, I'd suggest removing also its mention. This is disastrous! What will happen to the systems which already implemented this recommendations!? It's not a recommendation. Will they be considered invalid implementation afterwards? What is about stability? They are ugly implementations as they are. And will stay ugly implementations. Stability is good ;-). /Kent K
How is glyph shaping done?
Hello, After one of the replies that I received for my previous question, I thought of a more general question about how glyph shaping is done. I'm just wondering, when a Unicode rendering program is doing glyph shaping for Arabic (or any other language with similar properties), would the program first convert all Unicode Arabic characters in the 06XX domain into Arabic presentation forms in the FXXX domain, and then render each one of these presentation forms one by one and join them together? Or are there other possible ways to do glyph shaping in Unicode? So does this mean that every character rendered on the screen in a Unicode-enabled program such as Internet Explorer or some editor, have a corresponding presentation form Unicode associated to it? Thanks, Mete
Re: How is glyph shaping done?
Mete Kural asked: when a Unicode rendering program is doing glyph shaping for Arabic (or any other language with similar properties), would the program first convert all Unicode Arabic characters in the 06XX domain into Arabic presentation forms in the FXXX domain, and then render each one of these presentation forms one by one and join them together? Probably not. These days, Arabic shaping is typically done by the low-level rendering system built into your OS, in conjunction with data tables available in smart fonts. If you haven't already looked into some of the literature about how Arabic is encoded and how it is supported on various platforms, you should probably do so. Please see the Unicode standard online, http://www.unicode.org/uni2book/u2.html chapter 8, and the technical report on Bidi http://www.unicode.org/reports/tr9/ . Probably also there are some questions in the FAQ http://www.unicode.org/faq and also please see one or more of the presentations by Thomas Milo on Arabic here: http://www.tradigital.de/specials/casestudies.htm That would save you a lot of work. Most platforms these days already support Arabic rendering, so you don't need to worry about this level of detail, unless you are planning to implement a new system from scratch. I would expect the Microsoft developer web site to also have some info on their Arabic implementation... So does this mean that every character rendered on the screen in a Unicode-enabled program such as Internet Explorer or some editor, have a corresponding presentation form Unicode associated to it? No. It means that the fonts have appropriate tables and the rendering engine, Uniscribe or whatever, knows how to handle the font to do correct shaping when the text is rendered. You should not be using any presentation form characters in your text, just nominal forms from the 0600 block. Rick
Re: How is glyph shaping done?
At 09:28 AM 1/31/2003, Mete Kural wrote: So does this mean that every character rendered on the screen in a Unicode-enabled program such as Internet Explorer or some editor, have a corresponding presentation form Unicode associated to it? No. Most complex script shaping is now handled by a combination of shaping engine and font lookups. The shaping engine analyses the text strings, performs any character level pre-processing (e.g. re-ordering for Indic scripts), and then implements specific lookups in the font for glyph substitution and positioning. This means that there is no need for the various contextual forms of Arabic letters to be encoded in a font's character-to-glyph mapping data at all. On Windows, the shaping engines for complex scripts are part of Uniscribe (usp10.dll) and make use of OpenType font technology. An Arabic OpenType font will contain layout features for Initial init, Medial medi and Final fini substitutions (and possibly Isolated isol, e.g. to handle contextual variation of the letter heh). Uniscribe analyses strings of Arabic text, keeps track of the position of letters and their neighbours, and implements the appropriate layout feature for each letter. For more information, see http://www.microsoft.com/typography/developers/opentype/default.htm, and the MS Arabic font specification at http://www.microsoft.com/typography/specs/default.htm John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] A book is a visitor whose visits may be rare, or frequent, or so continual that it haunts you like your shadow and becomes a part of you. - al-Jahiz, The Book of Animals
compatibility between unicode 2.0 and 3.0
We have a large amount of C++ that currently has Unicode 2.0 support. Could you all help me figure out what types of operations will fail if we attempt to pass Unicode 3.0 thru this code? I can start the list off with -sorting -searching for text -text comparison -other character classification (isSpace, isDigit, etc...). I'm understand that these operations probably won't work in ALL cases. But how about basic plumbing code -- creating and copying string? As I mentioned in my last post, I've enjoyed listening in on this forum -- I've learned a whole lot. Thanks, --Erik Ostermueller
Re: urban legends just won't go away!
Barry Caplan wrote: Who knew in this day and age flipping bits to change case is still publishable (this is from today!) What I find a lot more objectionable is that what this code pretends to do is not defined (in particular, the domain over which it applies). Without such qualification, we cannot say if the code is correct or not, no matter how fishy it looks. In fact, this example is a perfectly valid implementation if the system pretends to handle only an appropriate subset of the Unicode character set. For more information, see http://www.cs.utexas.edu/users/EWD/. Eric.