I'm working on an application where I need to replace some Unicode
characters with a PHP shell script.

The problem I'm having is matching multibyte characters.

One character in particular is Unicode 2014 an m-dash '—'

To get the decimal code to identify the character I've tried two
methods, one was with a little script like this:

<?php
$x=0;
while($letter = substr('—',$x,1)){
        echo "$letter\t" . ord($letter) ."\n";
        $x++;
}
?>

I took the multiple decimal codes returned and tried a replacement like:

echo str_replace(chr(226).chr(128).chr(148),"-hyphen",$data);

But that didn't match it I figured my method of finding the decimal
value wasn't correct.

Next I thought if I UTF-8 encoded the character and took the ord() and
used that decimal to compare against a utf-encoded version of my string.
but when I tried getting the ord() of all the characters I wanted to
watch for it always gave me the same decimal, 195. So that idea wasn't
going to work either.

I'm out of ideas, any input would be appriciated. Thanks.


-- 
Jeff Bearer, RHCE
Webmaster, PittsburghLIVE.com


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to