Paul Bijnens <[EMAIL PROTECTED]> writes:
>Can anyone explain what I'm doing wrong?
As I recall HTML::Entities has a build-time option as to whether it handles
Unicode - do you know if yours has that turned on?
What locale are you in (i.e. is it something that has â as a native
8-bit coding (Windows 1251 or iso-8859-15 say)?
>I have this recurring problem of strings not being flagged
>as utf8, when -- I believe -- they should be.
>
>One of those cases is in decode_entities() from the module
>HTML::Entities, but I have other occurances too (e.g. in Plucene).
>
>When I run this program:
>
>########### cut here
>#!/usr/bin/perl
>use HTML::Entities;
>use Encode;
>print "This is perl ", $], "\n";
>
>$s = "€";
>$t = decode_entities($s);
>$u = decode("utf8", $t, Decode::FB_CROAK);
>
>print "t: ", Encode::is_utf8($t) ? "is" : "not", " utf8", "\n";
>print "u: ", Encode::is_utf8($u) ? "is" : "not", " utf8", "\n";
>print "t: ", ($t eq "\x{20ac}") ? "is" : "not", " Eurosign\n";
>print "u: ", ($u eq "\x{20ac}") ? "is" : "not", " Eurosign\n";
>########### cut here
>
>I get this output:
>
>This is perl 5.008005
>t: not utf8
>u: is utf8
>t: not Eurosign
>u: is Eurosign
>
>I would expect that $t does have the utf8 flag set,
>as indicated in the manpage of HTML::Entities :
>
> decode_entities( $string )
> This routine replaces HTML entities found in the
> $string with the corresponding ISO-8859-1 character,
> and if possible (under perl 5.8 or later) will replace
> to Unicode characters. Unrecognized entities are left
> alone.
>
>Why do I have to force the utf8 flag using decode("utf8",..) ?
Well that does suggest what you expect I agree.
>
>One of my guesses is that the problem lies in XS-processing of strings
>where the utf8 flag is not set correctly. True?
Certainly possible - suggest you contact author of HTML:Entities
It is also possible it is left encoded deliberately.
>Why does nobody else complain then?
>
>Is my setup wrong? (Tried this on different installations including
>a brand new Fedore Core 3...)
>
>
>--
>Paul Bijnens, Xplanation Tel +32 16 397.511
>Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
>http://www.xplanation.com/ email: [EMAIL PROTECTED]
>***********************************************************************
>* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
>* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
>* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
>* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
>* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
>* ... "Are you sure?" ... YES ... Phew ... I'm out *
>***********************************************************************