The online document below has the information you need.

http://us4.php.net/manual/en/function.utf8-encode.php

The only thing that the code segment you sent seems to do is use
the (int) cast to convert a string to an integer.  It first tries
to match the strings '۰' through '۹' against the
input strings.  If it matches, it casts the string to integer.
PHP uses the standard C library function strtod() to do the cast.
I suspect that PHP converts the HTML encoding back to UTF8 encoding
(probably utf8_encode function)  before calling strtod(); Basically '۰'
becomes 0x6F0, etc..  Strtod() then tries to convert the string, paying attention
to the locale and interprets the 0x6F0-0x6F9 unicode values as numeric 0-9;
I don't know why it wouldn't work on all browsers.
 
If you want details, continue reading.

The script is very obfuscated.  Starting with this array:

> <?php
> $farsi_table=array("4758678",  "38354955555459", #Zero 
>                      "38354955555559", #one 
>          "38354955555659", #two 
>        "38354955555759", #three 
>        "38354955564859", #four 
>        "38354955564959", #five 
>        "38354955565059", #six 
>        "38354955565159", #seven 
>        "38354955565259", #eight 
>        "38354955565359" #nine 
>      );

If you take the string "38354955555459" (The "#Zero" element) and then
convert it to two character sequences "38 35 49 55 55 54 59" and then
lookup the those ordinal positions in an ASCII table -- the 38th ASCII
character is '&', the 35th is '#', the 49th is '1', the 55th is '7',
etc.  -- you'll see that these strings are just representing the ascii
values for '& # 1 7 7 6 ;' in decimal!   Why not just leave those as "&#1776;", etc.

The string '&#1776;' as you all know, is the HTML notation for
specifying a character that can't be typed for whatever reason.  In
this case this is the representation of the UNICODE value 0x6F0 which
is "extended arabic-indic digit zero".  The others follow the same
logic, all the way to "&#1785;".  The function utf8_to_int() is in fact
not a UTF8-to-int conversion, but an HTML-encoding-of-UTF8-to-int
conversion.

I think the following will do what you want, and it might help avoid
problems with strtod() versions that might not be unicode safe.

function utf8_to_int($str)
{
        $transtbl = array (
                "&#1776;" => '0',
                "&#1777;" => '1',
                "&#1778;" => '2',
                "&#1779;" => '3',
                "&#1780;" => '4',
                "&#1781;" => '5',
                "&#1782;" => '6',
                "&#1783;" => '7',
                "&#1784;" => '8',
                "&#1785;" => '9'
        );

        # Could add the following line to make sure that
        # the string is HTML encoded first
        # $str = htmlentities($str, ENT_COMPAT, "UTF-8" );

        $str = strtr($str, $transtbl);
        return (int) $str;
}

Disclaimer: I have not tested the code above and in fact I've never
written a PHP script before tonight.  I spent a few hours reading the
language manual and looking at the sources tonight.  I can't guarantee
the results :)

-Fariborz

_______________________________________________
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb

Reply via email to