Hi all,
Have you every heard that scientists discover things around 4AM?
Well, I did too :-).
I think I have found a reasonable but radical solution to an old
Egg and Chicken problem in Persian computing. Yep, I mean the
usage of ZWNJ to separate persian sub-words in an expression.
Some simple examples which I will get back later are these words,
all of you know:
{ketaab-haa}
{mi-ravam}
{ping-pong}
{zedde-aab}
The entity that has been replaced by dash in the above words, is
what I wish to discuss.
The Unicode Standard contains a pair of formating characters
ZERO WIDTH NON-JOINER 200C (ZWNJ) and ZERO WIDTH JOINER 200D (ZWJ).
As one can read in The Unicode Standard section 13.2, they have
been inherited from the old ISIRI 3342's "psuedo space" and
"psuedo connection". The Unicode Standard, also, deriving from
ISIRI 3342 counts two usages for these entities, quoting:
* Cause nondefault joining appearance (for example, as
is sometimes required in writing Persian using the
Arabic script).
* Exhibit the joinin-variant glyphs themselves in
isolation.
The other way, latest publication of The Persian Academy of
Language and Literature writes that this entity is some narrow
space, which is hard to implement in hand-writing, but is used in
typography.
Remembering the discussions that ZWNJ should be ignored when
processing the text, I knew that there are some rumors that we
shouldn't use it to separate Persian sub-words (what are they
called in Persian linguistics?) And I got back to the problem
a few days ago, when trying to typeset some Persian material
using the Omega engine. I hacked Omega to space some thin space
where-ever that ZWNJ is preventing joining of two letters. The
outcome was wonderful, but the problem was there in words like
{zedde-aab}, or as in my original example {zedde-jaasoosi}.
The other side of that (being a coin, or a moon), everybody knows
that the existance of this dash-like entity, can be crucial in
Persian information processing, like spell-checking, etc.
So I decided that, we shouldn't use ZWNJ for this purpose. ZWNJ
should remain just for exhibition usages. I was thinking about
ordering a new character to the Unicode consertium, but before
doing this, I searched the Unicode characters, and unbelievably
found it!!!
The lost character that I'm goin' to introduce you, is U+202F.
U+202F earn its respect from the fact that is sitted by the other
formatting characters used in Persian, in my opinion. Oh, I
really forgot to tell you it's name: NARROW NO-BREAK SPACE.
Isn't it what we want? It's in the second Formatting characters
region of the U+2000 page, along with LS, PS, LRE, RLE, PDF, LRO,
and RLO, all used in Persian text, isn't it amasing? The other
amazing fact is that, this character has appeared in table 6-1
"Unicode Space Characters" of the Unicode book, but there's no
specification in Space Characters subsection of section 6.1
"General Punctuation"; means it's a character with almost no
known usage and origin!
And let me summarize my supports:
* It's not ignored in text-processing anymore.
* You will get the right spacing in typography, without
any markup. It highly increases readability, and more
conformant, ofcourse I mean to Persian academy.
* Unlike ZWNJ, it holds some piece of semantics. Means
it makes the difference between {khaan-haaye} and
{khaane-i}. In my own opinion the former should be
written as {khaan-haa-ye}, because the {haaye} itself
can be read as {haay-e} or {haa-ye}. Don't you agree?
* It has no wrong side-effects, means, it's still not
breakable, it doesn't end word boundaries...
* Well, we are getting some dead Unicode character back
to life, isn't it enough?
Well, this my latest discovery, I know I will have a busy INBOX
today, but I like it. It was the idea, I know that it may be
neede to use both of them, ZWNJ, and NNBS, in different
occasions, for example, ZWNJ in {mi-ravam} and {ketaab-haa}, and
NNBS in {zedde-aab}, but what I believe is that I'm going to use
it in regular texts in a way.
Yours,
--
Behdad Esfahbod 18 Aban 1381, 2002 Nov 9
http://behdad.org/ [Finger for Geek Code]
#define is_persian_leap(y) ((((y)-474)%2820+2820)%2820*31%128<31)
_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing