[EMAIL PROTECTED] said:
> -i need to covert the strings to 2 Unicode formats ,
> on like this \ua9e0 for each character
> on like this &#a9e0 for each character 

I think in the latter case, you might really want "ꧠ" (decimal 
number, terminated with semi-colon), if your intention is to produce HTML 
numeric entities for unicode characters.

One basic approach (assuming $_ contains a utf8 string) is:

  # convert non-ascii to "\uHHHH":
  s/([^[:ascii:]])/sprintf("\\u%04x",ord($1))/eg;

  # convert non-ascii to "&#nnnn":
  s/([^[:ascii:]])/sprintf("&#%d;",ord($1))/eg;

and similarly for other variants.  Look at the section on "POSIX character 
class syntax" regarding the "[:ascii:]" expression.

        David Graff


Reply via email to