Hi all,

Have you every heard that scientists discover things around 4AM?
Well, I did too :-).

I think I have found a reasonable but radical solution to an old 
Egg and Chicken problem in Persian computing.  Yep, I mean the 
usage of ZWNJ to separate persian sub-words in an expression.

Some simple examples which I will get back later are these words, 
all of you know:

{ketaab-haa}
{mi-ravam}
{ping-pong}
{zedde-aab}

The entity that has been replaced by dash in the above words, is 
what I wish to discuss.

The Unicode Standard contains a pair of formating characters
ZERO WIDTH NON-JOINER 200C (ZWNJ) and ZERO WIDTH JOINER 200D (ZWJ).
As one can read in The Unicode Standard section 13.2, they have 
been inherited from the old ISIRI 3342's "psuedo space" and 
"psuedo connection".  The Unicode Standard, also, deriving from 
ISIRI 3342 counts two usages for these entities, quoting:

        *  Cause nondefault joining appearance (for example, as 
           is sometimes required in writing Persian using the 
           Arabic script).

        *  Exhibit the joinin-variant glyphs themselves in
           isolation.

The other way, latest publication of The Persian Academy of 
Language and Literature writes that this entity is some narrow 
space, which is hard to implement in hand-writing, but is used in 
typography.

Remembering the discussions that ZWNJ should be ignored when 
processing the text, I knew that there are some rumors that we 
shouldn't use it to separate Persian sub-words (what are they 
called in Persian linguistics?)  And I got back to the problem 
a few days ago, when trying to typeset some Persian material 
using the Omega engine.  I hacked Omega to space some thin space 
where-ever that ZWNJ is preventing joining of two letters.  The 
outcome was wonderful, but the problem was there in words like 
{zedde-aab}, or as in my original example {zedde-jaasoosi}.  

The other side of that (being a coin, or a moon), everybody knows 
that the existance of this dash-like entity, can be crucial in 
Persian information processing, like spell-checking, etc.

So I decided that, we shouldn't use ZWNJ for this purpose.  ZWNJ 
should remain just for exhibition usages.  I was thinking about 
ordering a new character to the Unicode consertium, but before 
doing this, I searched the Unicode characters, and unbelievably 
found it!!!


The lost character that I'm goin' to introduce you, is U+202F.
U+202F earn its respect from the fact that is sitted by the other 
formatting characters used in Persian, in my opinion.  Oh, I 
really forgot to tell you it's name:  NARROW NO-BREAK SPACE.
Isn't it what we want?  It's in the second Formatting characters 
region of the U+2000 page, along with LS, PS, LRE, RLE, PDF, LRO, 
and RLO, all used in Persian text, isn't it amasing?  The other 
amazing fact is that, this character has appeared in table 6-1 
"Unicode Space Characters" of the Unicode book, but there's no 
specification in Space Characters subsection of section 6.1 
"General Punctuation";  means it's a character with almost no 
known usage and origin!

And let me summarize my supports:

        * It's not ignored in text-processing anymore.

        * You will get the right spacing in typography, without 
        any markup.  It highly increases readability, and more
        conformant, ofcourse I mean to Persian academy.

        * Unlike ZWNJ, it holds some piece of semantics.  Means 
        it makes the difference between {khaan-haaye} and 
        {khaane-i}.  In my own opinion the former should be 
        written as {khaan-haa-ye}, because the {haaye} itself
        can be read as {haay-e} or {haa-ye}.  Don't you agree?

        * It has no wrong side-effects, means, it's still not
        breakable, it doesn't end word boundaries...

        * Well, we are getting some dead Unicode character back
        to life, isn't it enough?



Well, this my latest discovery, I know I will have a busy INBOX 
today, but I like it.  It was the idea,  I know that it may be 
neede to use both of them, ZWNJ, and NNBS, in different 
occasions, for example, ZWNJ in {mi-ravam} and {ketaab-haa}, and 
NNBS in {zedde-aab}, but what I believe is that I'm going to use 
it in regular texts in a way.

Yours,
-- 
Behdad Esfahbod         18 Aban 1381, 2002 Nov 9 
http://behdad.org/      [Finger for Geek Code]

#define is_persian_leap(y) ((((y)-474)%2820+2820)%2820*31%128<31)



_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to