David E. Wheeler schrieb am 16.06.2010 um 13:59 (-0700):
On Jun 16, 2010, at 9:05 AM, David E. Wheeler wrote:
On Jun 16, 2010, at 2:34 AM, Michael Ludwig wrote:
In order to print Unicode text strings (as opposed to octet
strings) correctly to a terminal (UTF-8 or not), add the following
At 00:27 +0100 18/6/10, I wrote:
If I save the file and undo the second decoding I get the proper output
In this case all talk of iso-8859-1 and cp1252 is a red herring. I
read several Italian websites where this same problem is manifest in
external material such as ads. The news page
On Jun 18, 2010, at 12:05 AM, John Delacour wrote:
In this case all talk of iso-8859-1 and cp1252 is a red herring. I read
several Italian websites where this same problem is manifest in external
material such as ads. The news page proper is encoded properly and declared
as utf-8 but I
On Jun 17, 2010, at 12:30 PM, Henning Michael Møller Just wrote:
So it may be valid UTF-8, but why does it come out looking like crap? That
is, LaurinaviÃ≥Ÿius? I suppose there's an argument that
LaurinaviÄŸius is correct and valid, if ugly. Maybe?
I am unsure if this is the explanation
At 13:24 -0700 17/6/10, David E. Wheeler wrote:
On Jun 17, 2010, at 12:30 PM, Henning Michael Møller Just wrote:
So the original character \x{010d} is represented by the bytes
\x{c4} and \x{8d}, an application thinks those are in fact
characters and encodes them again as \x{c3} + \x{84} and
Hello (loved your PostgreSQL presentation at the most recent OSCON, BTW)
Which editor do you use? When loading the script in Komodo IDE 5.2 the string
looks broken. Running the script (ActivePerl 5.10.1 on Windows) only the second
line is correct - the first (no surprise) and third are broken.
On Wed, Jun 16, 2010 at 01:59:33PM -0700, David E. Wheeler wrote:
I think what I need is some code to strip non-utf8 characters from a string
-- even if that string has the utf8 bit switched on. I thought that Encode
would do that for me, but in this case apparently not. Anyone got an
example?
On Jun 16, 2010, at 4:47 PM, Marvin Humphrey wrote:
On Wed, Jun 16, 2010 at 01:59:33PM -0700, David E. Wheeler wrote:
I think what I need is some code to strip non-utf8 characters from a string
-- even if that string has the utf8 bit switched on. I thought that Encode
would do that for me,
On Wed, Jun 16, 2010 at 05:34:44PM -0700, David E. Wheeler wrote:
So the UTF8 flag is enabled, and yet it has \303\204\302\215 in it. What is
that crap?
That's octal notation, which I think Dump() uses for any byte greater than 127
and for control characters, so that it can output pure ASCII.