Hello (loved your PostgreSQL presentation at the most recent OSCON, BTW)
Which editor do you use? When loading the script in Komodo IDE 5.2 the string
looks broken. Running the script (ActivePerl 5.10.1 on Windows) only the second
line is correct - the first (no surprise) and third are broken.
David E. Wheeler schrieb am 15.06.2010 um 22:55 (-0700):
>
> But the curious thing is, when I pull the offending string out of
> the RSS and just stick it in a script, Encode knows how to decode it
> properly, while XML::LibXML (and my Unicode-aware editors) cannot.
Try passing the parser options
I remember XML::LibXML doing funky things with the utf8 flag -- but in
your case,
is it possible to try using a proper XML declaration?
i.e.:
Tomas
This seems to produce the correct output for me (perl 5.12.1, LibXML 1.70)
--d
2010/6/16 David E. Wheeler :
> Fellow Perlers,
>
> I'm par
On Jun 16, 2010, at 12:04 AM, Henning Michael Møller Just wrote:
> Hello (loved your PostgreSQL presentation at the most recent OSCON, BTW)
Thanks. Come see my tutorial at OSCON this year, if you can: Test-Driven
Database Development. :-) Not sure I can make a tutorial as entertaining, alas.
Pe
On Jun 16, 2010, at 2:34 AM, Michael Ludwig wrote:
> David E. Wheeler schrieb am 15.06.2010 um 22:55 (-0700):
>>
>> But the curious thing is, when I pull the offending string out of
>> the RSS and just stick it in a script, Encode knows how to decode it
>> properly, while XML::LibXML (and my Unic
On Jun 15, 2010, at 11:24 PM, Daisuke Maki wrote:
> I remember XML::LibXML doing funky things with the utf8 flag -- but in
> your case,
> is it possible to try using a proper XML declaration?
>
> i.e.:
>
>Tomas
No, I'm pulling the example I posted out of the CDATA of an RSS description
On Jun 16, 2010, at 9:05 AM, David E. Wheeler wrote:
> On Jun 16, 2010, at 2:34 AM, Michael Ludwig wrote:
>
>> Try passing the parser options as a hash reference:
>>
>> my $doc = $parser->parse_html_string($str, {encoding => 'utf-8'});
>
> WTF! That fixes it! I don't understand why it seems to
At 22:55 -0700 15/6/10, David E. Wheeler wrote:
...So my question is, what gives? Is this truly a broken
representation of the character and Encode just figures that out and
fixes it? Or is there something off with my editor and with
XML::LibXML.
...Attachment converted: macmini:try.pl (TEXT
On Jun 16, 2010, at 3:07 PM, John Delacour wrote:
> When I open your attachment 'try.pl' in BBEdit it has Mac encoding and Mac
> linefeeds and five invisible characters that I haven't analysed wherever you
> have double line-spacing. And if I tell BBEdit to re-open the file as utf-8
> I get th
On Wed, Jun 16, 2010 at 01:59:33PM -0700, David E. Wheeler wrote:
> I think what I need is some code to strip non-utf8 characters from a string
> -- even if that string has the utf8 bit switched on. I thought that Encode
> would do that for me, but in this case apparently not. Anyone got an
> examp
On Jun 16, 2010, at 4:47 PM, Marvin Humphrey wrote:
> On Wed, Jun 16, 2010 at 01:59:33PM -0700, David E. Wheeler wrote:
>> I think what I need is some code to strip non-utf8 characters from a string
>> -- even if that string has the utf8 bit switched on. I thought that Encode
>> would do that for
On Wed, Jun 16, 2010 at 05:34:44PM -0700, David E. Wheeler wrote:
> So the UTF8 flag is enabled, and yet it has "\303\204\302\215" in it. What is
> that crap?
That's octal notation, which I think Dump() uses for any byte greater than 127
and for control characters, so that it can output pure ASC
12 matches
Mail list logo