On Sun, 2003-11-16 at 02:03, C Bobroff wrote:
> I believe in principle, the search engines are to consider Persian and
> Arabic Yeh to be the same.

They should consider it to be "weakly equivalent". That's the term. Same
as they do for, say, capital "A" and small "a", or a-umlaut ("ä") and
normal "a".

> Yet they do not.  Why?  Are the tables they are consulting faulty? 
> Are they consulting the wrong tables?  Does no one even know this is a
> problem?  Who handles this so they can fix the problem?

Google does, for example. MSN also does, if they have a Persian search.
As for contacting them, feel free to do so. I had done the same once
with all the authority I could use from the Unicode Consortium, to no
avail (in other words, I got no reply or action).

> And what if we WANT the search engine
> to distinguish between the Persian and Arabic?

You provide an "option" to the engine, mentioning that it shouldn't use
its equivalent tables. But can you do that with "A" and "a" in Google?


> something like the first line in the Divan of Hafez:
> alaa yaa ayyuhaa saaqi ader ka'san wa  naawelhaa
> Is that lang="fa" or lang="ar"?

I agree that it's a hard question. Really depends on how you are going
to write the "saaqi" part. Since it's pronounced /i:/, it should be
written as dotted Yeh in the Arabic language. If you're writing it with
a dotless Yeh, it should be Persian transliteration of Arabic text. Now,
how do you mark an English tranliteration of Arabic text? With "en" or
"ar"? Of course you'll use "en". So in that case, you should use "fa".

> I wonder why you say "#1740;" instead of "U+06CC"?? :) :)

He's a real person, not a computer programmer! Real people prefer
decimal to hexadecimal, I guess. But AmirBehzad, it's an inconvenience
to refer to Unicode characters by their decimal code. If you want to use
HTML escapes, please use the in "ی" format, instead of "ی".
Both are unreadable to a casual reader, but the first is readable by an
specialist without using a scientific calculator.

> I am slowly starting to think your idea is indeed the solution to the Yeh
> and Kaf problem.  I hope the more technically astute people will
> also wake up and give you some feedback.

The solution? The solution is of course fixing every software
immediately. But I agree that AmirBehzad's is acutally a nice idea. To
detect what the browser support properly (possibly using some
JavaScript, browser sniffing, and other tricks) and then serve the
browser what it can display.

It works fine for display purposes, but there are scenarios that it not
sufficent. Let's say a user is using IE5 on Win98, and he has the
Persian Yeh bug. AmirBehzad's script serves her Arabic Yeh in medial and
initial forms. She sees everything fine. But then, she wants to search
the (already-retreived) web page using the "Find" menu on her browser
(which has not implemented any such Yeh equivalence). The result: she
can't find the Arabic Yeh (or the Persian one).

Another alternative story: Let's say the writer of the page likes to
say: "Don't use Arabic Yehs like 'ي', use Persian Yehs like 'ی'."?
You'll agree that he will be scared when the software does him weird
things.

The best solution, is updating the software and the fonts. And nagging
to the developers of the software if that doesn't fix the problem. And
writing your own software if that did't work either. Or learning to
write software if you don't know how. Or forgetting it all if it's not
worth the effort.

> (RoozBEH, are you almost done cleaning out your Inbox??)

I'm doing it now. Next shot in 90 days.

> Perhaps the script could also check if the win9x user has IE6 in
> addition and if so, let them see Persian.

I agree.

> I would like to request that you make a simple webpage and post it
> somewhere for newbies to copy and paste. 
> It would be nice if you put a
> little alternating Persian and English content so people see how to switch
> between the two. An exterior .CSS file that is 100% compliant with
> directions for copying for one's own use would be so nice.  For test
> purposes, the Persian content should include some tricky things like
> parentheses, diacritics (tashdid, sokun, zir, zabar, pish, etc),
> zero-width-joiner, zero-width-non-joiner, heh+hamza, and something
> requiring mouseovers (or some such feature requiring the browser to
> calculate where the word is on the page.) After making everything as
> standard and compliant as possible, also put in your script, and most
> important, directions for how to copy and explanation for why it is there,
> I think this would be the best.

Very good recommendations.

roozbeh


_______________________________________________
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb

Reply via email to