[PHP] Usage of strlen(tuf8_decode()) and /u regex modifier

2009-09-21 Thread GoForThisWorld

Hello,

  As indicated below, the strlen(tuf8_decode()) and the /u regex
  modifier do not work as per my understanding.  

  1) What is my misunderstanding?  

  ?php
  
  $the_string = '#1052;#1072;#1088;#1080;#1085;#1072; 
#1054;#1088;#1083;#1086;#1074;#1072;';
  echo pauthor (85 bytes):$the_string, . strlen($the_string) . ',' 
. strlen( utf8_decode( $the_string ) ) . ',' .
strlen( utf8_decode( utf8_encode($the_string) ) ) . ',' .  /p;
  // all the number echoed are 85, I expected at least one to be 13

  
  $max_length = 20;
  $is_short = preg_match( '/^.{1,$max_length}$/u', uft8_encode( 
$the_string ) ) );
  // expect the above to return 1
  
  $max_length = 10;
  $is_short = preg_match( '/^.{1,$max_length}$/u', uft8_encode( 
$the_string ) ) );
  // expect the above to return 0
  
  ?

  More generally, given a string $the_string:

  2) how to determine what encoding is being used?

  3) how to determine the number of visible characters?

  4) if it has more than N visible characters, how to 
 truncate it after N visible characters?

  Thanks!


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Usage of strlen(tuf8_decode()) and /u regex modifier

2009-09-20 Thread GoForThisWorld

Hello,

  As indicated below, the strlen(tuf8_decode()) and the /u regex 
  modifier do not work as per my understanding.  

  1) What is my misunderstanding?  

  ?php
  
  $the_string = '#1052;#1072;#1088;#1080;#1085;#1072; 
#1054;#1088;#1083;#1086;#1074;#1072;';
  echo pauthor (85 bytes):$the_string, . strlen($the_string) . ',' 
. strlen( utf8_decode( $the_string ) ) . ',' .
strlen( utf8_decode( utf8_encode($the_string) ) ) . ',' .  /p;
  // all the number echoed are 85, I expected at least one to be 13

  
  $max_length = 20;
  $is_short = preg_match( '/^.{1,$max_length}$/u', uft8_encode( 
$the_string ) ) );
  // expect the above to return 1
  
  $max_length = 10;
  $is_short = preg_match( '/^.{1,$max_length}$/u', uft8_encode( 
$the_string ) ) );
  // expect the above to return 0
  
  ?

  More generally, given a string $the_string:

  2) how to determine what encoding is being used?

  3) how to determine the number of visible characters?

  4) if it has more than N visible characters, how to 
 truncate it after N visible characters?

  Thanks!


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php