Hello, 

  

I tried to convert html special characters to their real character. 

For example, converting    ”      to     "   . 

  

I had the string 

$str = "“ test ” ניסיון "; 

The string contain also Hebrew letters. 

  

1. first I did: 

$str =  decode_entities($str); 

It convert the special characters okay.  

The problem is that the Hebrew came not okay. 

So when I print the value of the $str I get the hebrew as  יסיון 

  

2. Then I decided to write a regular expression that change only the
html special characters. 

I wrote: 

$str = "“ test ” ניסיון "; 

$str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/ge; 

Even that it should work only on the matches sub string, it's seem that
it happen also on the Hebrew letters. 

The Hebrew letters came again as  יסיון

Part 1 and 2 give the same output. 

  

3. I decide to check the regular expression, I remove the 'e' in the
end of the regular expression so I can see the conversion. 

I wrote: 

$str = "“ test ” ניסיון "; 

$str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/g; 

The output was: 

decode_entities(“) test decode_entities(”) ניסיון 

The Hebrew came out okay, of course. 

  

4. I can do : 

$str =~ s/“|”/"/g; 

Which don't effect the Hebrew, and convert the html characters. 

The problem that there are other html special characters that exist in
the data. 

I would like to do something more generic that will work also for the
future.

Any ideas are welcome!! 

Shlomit. 
_______________________________________________
Perl mailing list
Perl@perl.org.il
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to