Re: Arabic Presentation Forms

2003-01-31 Thread John Hudson
At 09:56 PM 1/30/2003, Mete Kural wrote:


I need to figure out a method to convert Arabic
Unicode text encoded in its normal form to Arabic
Unicode text encoded in Arabic presentation forms.


May I ask why you want to do this?

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

A book is a visitor whose visits may be rare,
or frequent, or so continual that it haunts you
like your shadow and becomes a part of you.
   - al-Jahiz, The Book of Animals





Re: Arabic Presentation Forms

2003-01-31 Thread Bob_Hallissy

On 31/01/2003 05:56:55 Mete Kural wrote:

I need to figure out a method to convert Arabic
Unicode text encoded in its normal form to Arabic
Unicode text encoded in Arabic presentation forms.

Are you aware that the presentation forms are incomplete? That is, there
are Arabic letters in the U+06xx block for which there isn't a complete set
of presentation forms defined in the Arabic presentation forms areas.
You'll need to decide what to do with such...

Bob






Re: Arabic Presentation Forms

2003-01-31 Thread Shlomi Tal
Do you any suggestions on how I could convert a piece
of Unicode text in this manner? Are there any programs
that could do this?


Roman Czyborra's arabjoin (a Perl script):

http://czyborra.com/arabjoin/

It does the conversion to Arabic Presentation Forms. But also, which may not 
be what you need, it converts logically-ordered Arabic to visual order; this 
for display on systems that support neither BiDi nor Arabic shaping.

ST

_
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus




RE: Suggestions in Unicode Indic FAQ

2003-01-31 Thread Kent Karlsson

Keyur Shroff wrote:
...
 
 No fallback rendering is coming into picture with your explanation. 

Yes, there is.  A character sequence FULL STOP, VOWEL SIGN E (say)
is very unlikely to have a ligature, specially adapted (and fitting)
adjustment points, or similar.  The rendering would in that sense
need to use a fallback mechanism that renders an approximation
for this rare combination.

...
 Here is the para you are talking about.
 
 [Quote]
[...]
 should be rendered as if they had a space as a base character.
 [/Quote]
 
 In the text there is no mention of explicitly inputting space character
 before any combining mark that is defective combining character.

The text says as if. Which I also emphasised before.

 Also, the word should be rendered implies that it is recommendation. 

Yes.  A rather good one.  

  By removing that particular fallback mechanism from implementations
[inserting dotted circle glyphs for allegedly invalid combinations]
  as well as the TUS text!  (I'm serious!) This particular fallback
  mechanism is NOT recommended as it stands.  
 
 Note that the text has been written in the section Implementation
 Guidelines. Can't it be considered as recommendation?

That particular one, no.  Just an example [that isn't very good,
outside of a general show invisibles mode].

  But since its mention is erroneously taken as a recommendation, I'd 
  suggest removing also its mention.
 
 This is disastrous! What will happen to the systems which already
 implemented this recommendations!?

It's not a recommendation.

 Will they be considered invalid
 implementation afterwards? What is about stability?

They are ugly implementations as they are.  And will stay ugly
implementations.  Stability is good ;-).

/Kent K





How is glyph shaping done?

2003-01-31 Thread Mete Kural
Hello,

After one of the replies that I received for my
previous question, I thought of a more general
question about how glyph shaping is done. I'm just
wondering, when a Unicode rendering program is doing
glyph shaping for Arabic (or any other language with
similar properties), would the program first convert
all Unicode Arabic characters in the 06XX domain into
Arabic presentation forms in the FXXX domain, and then
render each one of these presentation forms one by one
and join them together? Or are there other possible
ways to do glyph shaping in Unicode?

So does this mean that every character rendered on the
screen in a Unicode-enabled program such as Internet
Explorer or some editor, have a corresponding
presentation form Unicode associated to it?

Thanks,
Mete




Re: How is glyph shaping done?

2003-01-31 Thread Rick McGowan
Mete Kural asked:

 when a Unicode rendering program is doing
 glyph shaping for Arabic (or any other language with
 similar properties), would the program first convert
 all Unicode Arabic characters in the 06XX domain into
 Arabic presentation forms in the FXXX domain, and then
 render each one of these presentation forms one by one
 and join them together?

Probably not. These days, Arabic shaping is typically done by the  
low-level rendering system built into your OS, in conjunction with data  
tables available in smart fonts. If you haven't already looked into some of  
the literature about how Arabic is encoded and how it is supported on  
various platforms, you should probably do so. Please see the Unicode  
standard online,
http://www.unicode.org/uni2book/u2.html chapter 8, and the technical  
report on Bidi http://www.unicode.org/reports/tr9/ . Probably also there  
are some questions in the FAQ http://www.unicode.org/faq and also please  
see one or more of the presentations by Thomas Milo on Arabic here:  
http://www.tradigital.de/specials/casestudies.htm

That would save you a lot of work. Most platforms these days already  
support Arabic rendering, so you don't need to worry about this level of  
detail, unless you are planning to implement a new system from scratch. I  
would expect the Microsoft developer web site to also have some info on  
their Arabic implementation...

 So does this mean that every character rendered on the
 screen in a Unicode-enabled program such as Internet
 Explorer or some editor, have a corresponding
 presentation form Unicode associated to it?

No. It means that the fonts have appropriate tables and the rendering  
engine, Uniscribe or whatever, knows how to handle the font to do correct  
shaping when the text is rendered. You should not be using any presentation  
form characters in your text, just nominal forms from the 0600 block.

Rick




Re: How is glyph shaping done?

2003-01-31 Thread John Hudson
At 09:28 AM 1/31/2003, Mete Kural wrote:


So does this mean that every character rendered on the
screen in a Unicode-enabled program such as Internet
Explorer or some editor, have a corresponding
presentation form Unicode associated to it?


No. Most complex script shaping is now handled by a combination of shaping 
engine and font lookups. The shaping engine analyses the text strings, 
performs any character level pre-processing (e.g. re-ordering for Indic 
scripts), and then implements specific lookups in the font for glyph 
substitution and positioning. This means that there is no need for the 
various contextual forms of Arabic letters to be encoded in a font's 
character-to-glyph mapping data at all.

On Windows, the shaping engines for complex scripts are part of Uniscribe 
(usp10.dll) and make use of OpenType font technology. An Arabic OpenType 
font will contain layout features for Initial init, Medial medi and 
Final fini substitutions (and possibly Isolated isol, e.g. to handle 
contextual variation of the letter heh). Uniscribe analyses strings of 
Arabic text, keeps track of the position of letters and their neighbours, 
and implements the appropriate layout feature for each letter.

For more information, see 
http://www.microsoft.com/typography/developers/opentype/default.htm, and 
the MS Arabic font specification at 
http://www.microsoft.com/typography/specs/default.htm

John Hudson

Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

A book is a visitor whose visits may be rare,
or frequent, or so continual that it haunts you
like your shadow and becomes a part of you.
   - al-Jahiz, The Book of Animals




compatibility between unicode 2.0 and 3.0

2003-01-31 Thread Erik.Ostermueller
We have a large amount of C++ that currently has Unicode 2.0 support.

Could you all help me figure out what types of operations will fail
if we attempt to pass Unicode 3.0 thru this code?

I can start the list off with 

-sorting 
-searching for text 
-text comparison
-other character classification (isSpace, isDigit, etc...).

I'm understand that these operations probably won't work in ALL cases.
But how about basic plumbing code -- creating and copying string?

As I mentioned in my last post, I've enjoyed
listening in on this forum -- I've learned a whole lot.

Thanks,

--Erik Ostermueller




Re: urban legends just won't go away!

2003-01-31 Thread Eric Muller


Barry Caplan wrote:


Who knew in this day and age flipping bits to change case is still publishable (this is from today!)
 

What I find a lot more objectionable is that what this code pretends to 
do is not defined (in particular, the domain over which it applies). 
Without such qualification, we cannot say if the code is correct or not, 
no matter how fishy it looks. In fact, this example is a perfectly valid 
implementation if the system pretends to handle only an appropriate 
subset of the Unicode character set.

For more information, see http://www.cs.utexas.edu/users/EWD/.

Eric.