any news on this?
hrvoje, you say that wget will presume utf-8, but then wget should have
decoded %C3%AD to an accented i (í). but today wget simply decodes the
characters one by one, creating a mess.
how can wget assume anything about encoding by the way? the filename
could be encoded as anything, right? for instance, the filename
"bl%E5%F8yd.zip" encoded in iso-8859-1 would suggest the filename
"blåøyd.zip" (blue-eyed in norwegian), which in utf-8 it would mean some
other character.
wouldn't the correct thing be NOT to decode escaped characters (at least
over 127), because it could mean anything depending on page authors
intention of assumed encoding.
anyway, is there maybe a separate mailaccount for bugs that would be
more appropriate to use than this list?
Olav Mørkrid wrote:
wget saves the accented "i" in the filaname as the 8-bit utf-8
characters C3 and AD (unescaped), which results in garble since windows
file system is not utf-8 based.
so either some form of character conversion needs to take place (from
utf-8 to filesystem), or wget should save the filename percent-escaped.
VÃ-ctor_Jara (today)
V%C3%ADctor_Jara (escaped)
Víctor_Jara (converted)
Hrvoje Niksic wrote:
Olav Mørkrid <[EMAIL PROTECTED]> writes:
problem: international characters cause problems
the image of victor jara in article is lost
int. chars. in filename saved on local disk is garble
Wget saves exactly the characters it finds in the URL. If the URL
contains the sequence (presumably UTF-8) %C3%AD, that is what Wget
will write to the file name.
What characters did you expect to find in local file names?