Re: making utf8-clean CPAN distributions

2004-12-12 Thread David E . Wheeler
On Dec 12, 2004, at 10:06 PM, Darren Duncan wrote: What I would like to do is create my CPAN module distributions such that all of the files in each distro, code and documentation and tests and logs alike, are properly UTF-8 encoded, and do this in such a way that no modern Perl distributions

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-16 Thread David E. Wheeler
On Jun 16, 2010, at 4:47 PM, Marvin Humphrey wrote: On Wed, Jun 16, 2010 at 01:59:33PM -0700, David E. Wheeler wrote: I think what I need is some code to strip non-utf8 characters from a string -- even if that string has the utf8 bit switched on. I thought that Encode would do that for me

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-17 Thread David E. Wheeler
On Jun 17, 2010, at 12:30 PM, Henning Michael Møller Just wrote: So it may be valid UTF-8, but why does it come out looking like crap? That is, LaurinaviÃ≥Ÿius? I suppose there's an argument that LaurinaviÄŸius is correct and valid, if ugly. Maybe? I am unsure if this is the explanation

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-18 Thread David E. Wheeler
On Jun 18, 2010, at 12:05 AM, John Delacour wrote: In this case all talk of iso-8859-1 and cp1252 is a red herring. I read several Italian websites where this same problem is manifest in external material such as ads. The news page proper is encoded properly and declared as utf-8 but I