Re: Help me...

2003-09-07 Thread Skip Tavakkolian
Hi,

As I mentioned in my original posting, I am not a PHP programmer; But,
it seems that for keyboard input functions you WILL have to convert
UTF-8 to HTML encoding if you want to use your function; for example
by calling a function like htmlentities()  or utf8_decode() before
calling utf8_to_int().  The pseudo code looks like this:

if (input is in UTF-8) {# for example input is from a Linux keyboard
$str = htmlentities($str, ENT_COMPAT, UTF-8 );
}
$intvalue = utf8_ot_int($str);

I've included the function again; I added the extra code to check for x in set y
test.  Also note the use of the built-in strtr() function.  These builtins usually
are more efficient (written in C) and are faster.   This code has not been tested.

function utf8_to_int($str)
{
$transtbl = array (
#1776; = '0',
#1777; = '1',
#1778; = '2',
#1779; = '3',
#1780; = '4',
#1781; = '5',
#1782; = '6',
#1783; = '7',
#1784; = '8',
#1785; = '9'
);

foreach ($transtbl as $key = $value) {
if ($key == $str) { # it has an HTML 
encoded numeric
$str = strtr($str, $transtbl);  # convert all of them to ASCII 
[0-9]
return (int) $str;  # convert to whole 
thing to integer
}
}
return (int) $str;
}

One last thing, this function (and your version) has a fatal flaw. Do you
want to guess what it is? 

Hint:  What happens if the string is ABCDE?  What is the difference in the
return value from the function for the strings  ABCD and #1776;?

-Fariborz
---BeginMessage---
Hi dears.
Mr.Tavakkolian was helping me until my function completed.
This function converts utf8(digit) to integer.
$farsi_table_linux=array(NONE,
#1776;,
#1777;,
#1778;,
#1779;,
#1780;,
#1781;,
#1782;,
#1783;,
#1784;,
#1785;);
function search_index_array($str)
{
 global $farsi_table;
 
 for ($i=0;$i11;$i++) 
 { 
  if ($farsi_table[$i]==$str ) 
   return $i; 
 } 
 return FALSE;
}// end of search_index_array

function utf8_to_int($str)
{
 $len=strlen($str);
 $out=;
 $char=explode(;,$str);
 for ($i=0;$i$len;$i++)
  { 
   $char[$i].=;;
   if (search_index_array($char[$i])!=False)
 $out.=search_index_array($char[$i])-1;
  }//end of for ($i)
 return $out; 
}//end of utf8_to_int
When i call utf8_to_int(#1776;.#1785;) ,this return 09 ,But When i enter a 
number(utf8) via keyboard,This function return 0.
Please guide me that i how enter a number via keyboard(persian)  i get true answer.
--regards

_
Thank you for choosing LinuxQuestions.
http://www.linuxquestions.org
___
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb
---End Message---
___
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb


Re: Help.....

2003-09-05 Thread Skip Tavakkolian
The online document below has the information you need.

http://us4.php.net/manual/en/function.utf8-encode.php

The only thing that the code segment you sent seems to do is use
the (int) cast to convert a string to an integer.  It first tries
to match the strings '#1776;' through '#1785;' against the
input strings.  If it matches, it casts the string to integer.
PHP uses the standard C library function strtod() to do the cast.
I suspect that PHP converts the HTML encoding back to UTF8 encoding
(probably utf8_encode function)  before calling strtod(); Basically '#1776;'
becomes 0x6F0, etc..  Strtod() then tries to convert the string, paying attention
to the locale and interprets the 0x6F0-0x6F9 unicode values as numeric 0-9;
I don't know why it wouldn't work on all browsers.
 
If you want details, continue reading.

The script is very obfuscated.  Starting with this array:

 ?php
 $farsi_table=array(4758678,  3835495459, #Zero 
  3835495559, #one 
  3835495659, #two 
3835495759, #three 
38354955564859, #four 
38354955564959, #five 
38354955565059, #six 
38354955565159, #seven 
38354955565259, #eight 
38354955565359 #nine 
  );

If you take the string 3835495459 (The #Zero element) and then
convert it to two character sequences 38 35 49 55 55 54 59 and then
lookup the those ordinal positions in an ASCII table -- the 38th ASCII
character is '', the 35th is '#', the 49th is '1', the 55th is '7',
etc.  -- you'll see that these strings are just representing the ascii
values for ' # 1 7 7 6 ;' in decimal!   Why not just leave those as #1776;, etc.

The string '#1776;' as you all know, is the HTML notation for
specifying a character that can't be typed for whatever reason.  In
this case this is the representation of the UNICODE value 0x6F0 which
is extended arabic-indic digit zero.  The others follow the same
logic, all the way to #1785;.  The function utf8_to_int() is in fact
not a UTF8-to-int conversion, but an HTML-encoding-of-UTF8-to-int
conversion.

I think the following will do what you want, and it might help avoid
problems with strtod() versions that might not be unicode safe.

function utf8_to_int($str)
{
$transtbl = array (
#1776; = '0',
#1777; = '1',
#1778; = '2',
#1779; = '3',
#1780; = '4',
#1781; = '5',
#1782; = '6',
#1783; = '7',
#1784; = '8',
#1785; = '9'
);

# Could add the following line to make sure that
# the string is HTML encoded first
# $str = htmlentities($str, ENT_COMPAT, UTF-8 );

$str = strtr($str, $transtbl);
return (int) $str;
}

Disclaimer: I have not tested the code above and in fact I've never
written a PHP script before tonight.  I spent a few hours reading the
language manual and looking at the sources tonight.  I can't guarantee
the results :)

-Fariborz

___
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb


Re: [farsiweb]unicode fields in database

2002-11-10 Thread Skip Tavakkolian
ON SAT NOVEMBER  9 2002  [EMAIL PROTECTED] WROTE:
 Still, I'm curious. How come everyone on this discussion board is using the
 Latin alphabet? :-)


Well, most discussions here are also in English.  Should everyone in
Iran switch over to English?

Going back to the original point, would you explain how switching to
the Latin alphabet solves the Farsi text sorting problem in a database?

___
FarsiWeb mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/farsiweb