> Here is a solution (in fact a hack) that if implemented correctly, can
> resolve some of the issues till people and Google start using correct
> software:
>
> With a little tweaking, the web servers can translate the correct
> Unicode to the incorrect unicode desired so much by the Win9X users.
> That is, the web severs looks at the browser request, and if it can
> detect Win9X, translates all U+06CC's in the document to U+064A (and
> all other required translations). The same technique could be used to
> fool google into generating correct search results. That, is the web
> server generates a Win9X friendly version of the document and appends
> it to the original document. You can also allocate tags that the user
> of the web server can disable or enable some of these features. This
> may even make one gain some advatnage over other web hosting
> companies.

That solves half of the problem.  On Win9x, the key d on the keyboard
inserts an Arabic YEH, and on Win2K+, it inserts FARSI YEH.  So, if you use
this method, when a user types in a word containing yeh in the google's
search box on Win9x, they wouldn't find your site.

The best hack (or solution, as one might call it) I've found for this is
feeding a version of page too Google which contains both forms of words
(using YEH and FARSI YEH) so that the chances of google finding your page
for a certain keyword gets maximized.  Of course, certain measures must be
taken to prevent bad results, for example, the proximity of the words must
not get touched.  Nevertheless, this will cause other problems, such as
malformed keyword density, which cannot be solved reliably.  The problem
must be fixed in the search engine code, really, and such hacks have their
own downsides.  The search engine project I've been working on
<www.ariasearch.com> handles this (and the ARABIC KEHEH and FARSI KEH
problem) among other problems for searching in Persian text.

> Of course, the solution above is only a transient one, and it is up to
> people to upgrade their Win9X machines to something that is
> Unicode-compliant, also it is up to Google to program their systems
> such that it can understand that both U+06CC and U+064A are the same
> shape and hence should be regarded the same for searching unless user
> requests otherwise. This is the same as case-insensitive search that
> is usually implemented by mapping all upper and lower case characters
> -- in documents and queries alike -- to uppercase.

Yeah that's right.  Of course great attention must be paid so that it
doesn't break Arabic search results.


-------------
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]

He who sees the abyss, but with eagle's eyes - he who with eagle's talons
grasps the abyss: he has courage.
-Thus Spoke Zarathustra, F. W. Nietzsche



_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to