The online document below has the information you need.
http://us4.php.net/manual/en/function.utf8-encode.php
The only thing that the code segment you sent seems to do is use
the (int) cast to convert a string to an integer. It first tries
to match the strings '#1776;' through '#1785;' against the
input strings. If it matches, it casts the string to integer.
PHP uses the standard C library function strtod() to do the cast.
I suspect that PHP converts the HTML encoding back to UTF8 encoding
(probably utf8_encode function) before calling strtod(); Basically '#1776;'
becomes 0x6F0, etc.. Strtod() then tries to convert the string, paying attention
to the locale and interprets the 0x6F0-0x6F9 unicode values as numeric 0-9;
I don't know why it wouldn't work on all browsers.
If you want details, continue reading.
The script is very obfuscated. Starting with this array:
?php
$farsi_table=array(4758678, 3835495459, #Zero
3835495559, #one
3835495659, #two
3835495759, #three
38354955564859, #four
38354955564959, #five
38354955565059, #six
38354955565159, #seven
38354955565259, #eight
38354955565359 #nine
);
If you take the string 3835495459 (The #Zero element) and then
convert it to two character sequences 38 35 49 55 55 54 59 and then
lookup the those ordinal positions in an ASCII table -- the 38th ASCII
character is '', the 35th is '#', the 49th is '1', the 55th is '7',
etc. -- you'll see that these strings are just representing the ascii
values for ' # 1 7 7 6 ;' in decimal! Why not just leave those as #1776;, etc.
The string '#1776;' as you all know, is the HTML notation for
specifying a character that can't be typed for whatever reason. In
this case this is the representation of the UNICODE value 0x6F0 which
is extended arabic-indic digit zero. The others follow the same
logic, all the way to #1785;. The function utf8_to_int() is in fact
not a UTF8-to-int conversion, but an HTML-encoding-of-UTF8-to-int
conversion.
I think the following will do what you want, and it might help avoid
problems with strtod() versions that might not be unicode safe.
function utf8_to_int($str)
{
$transtbl = array (
#1776; = '0',
#1777; = '1',
#1778; = '2',
#1779; = '3',
#1780; = '4',
#1781; = '5',
#1782; = '6',
#1783; = '7',
#1784; = '8',
#1785; = '9'
);
# Could add the following line to make sure that
# the string is HTML encoded first
# $str = htmlentities($str, ENT_COMPAT, UTF-8 );
$str = strtr($str, $transtbl);
return (int) $str;
}
Disclaimer: I have not tested the code above and in fact I've never
written a PHP script before tonight. I spent a few hours reading the
language manual and looking at the sources tonight. I can't guarantee
the results :)
-Fariborz
___
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb