Good morning Shlomit, The string which you see in your first attempts (with 'x's and stuff) is the kind of gibberish that you get if you view text encoded as utf-8 - as if it was encoded in iso_8859-1 or something similar. You may want to apply a decoder to convert the string back into Unicode.
--- Omer On Thu, 2011-02-03 at 08:36 +0200, Shlomit Afgin wrote: > > > > Hello, > > > > > > I tried to convert html special characters to their real character. > > > For example, converting ” to " . > > > > > > I had the string > > > $str = "“ test ” ניסיון "; > > > The string contain also Hebrew letters. > > > > > > 1. first I did: > > > $str = decode_entities($str); > > > It convert the special characters okay. > > > The problem is that the Hebrew came not okay. > > > So when I print the value of the $str I get the hebrew as יסיון > > > > > > 2. Then I decided to write a regular expression that change only the > html special characters. > > > I wrote: > > > $str = "“ test ” ניסיון "; > > > $str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/ge; > > > Even that it should work only on the matches sub string, it's seem > that it happen also on the Hebrew letters. > > > The Hebrew letters came again as יסיון > > > > Part 1 and 2 give the same output. > > > > > > 3. I decide to check the regular expression, I remove the 'e' in the > end of the regular expression so I can see the conversion. > > > I wrote: > > > $str = "“ test ” ניסיון "; > > > $str =~ s/(&#(?=[0-9])*.{2,5};)/decode_entities($1)/g; > > > The output was: > > > decode_entities(“) test decode_entities(”) ניסיון > > > The Hebrew came out okay, of course. > > > > > > 4. I can do : > > > $str =~ s/“|”/"/g; > > > Which don't effect the Hebrew, and convert the html characters. > > > The problem that there are other html special characters that exist in > the data. > > > I would like to do something more generic that will work also for the > future. > > > > Any ideas are welcome!! -- May the holy trinity of $_, @_ and %_ be hallowed. My own blog is at http://www.zak.co.il/tddpirate/ My opinions, as expressed in this E-mail message, are mine alone. They do not represent the official policy of any organization with which I may be affiliated in any way. WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html _______________________________________________ Perl mailing list Perl@perl.org.il http://mail.perl.org.il/mailman/listinfo/perl