Re: wget and international characters (ascii > 127)

Olav Mørkrid Wed, 19 Oct 2005 02:56:03 -0700

any news on this?

hrvoje, you say that wget will presume utf-8, but then wget should havedecoded %C3%AD to an accented i (í). but today wget simply decodes thecharacters one by one, creating a mess.

how can wget assume anything about encoding by the way? the filenamecould be encoded as anything, right? for instance, the filename"bl%E5%F8yd.zip" encoded in iso-8859-1 would suggest the filename"blåøyd.zip" (blue-eyed in norwegian), which in utf-8 it would mean someother character.

wouldn't the correct thing be NOT to decode escaped characters (at leastover 127), because it could mean anything depending on page authorsintention of assumed encoding.


anyway, is there maybe a separate mailaccount for bugs that would be
more appropriate to use than this list?

Olav Mørkrid wrote:

wget saves the accented "i" in the filaname as the 8-bit utf-8characters C3 and AD (unescaped), which results in garble since windowsfile system is not utf-8 based.
so either some form of character conversion needs to take place (fromutf-8 to filesystem), or wget should save the filename percent-escaped.
VÃ-ctor_Jara (today)
V%C3%ADctor_Jara (escaped)
Víctor_Jara (converted)

Hrvoje Niksic wrote:
Olav Mørkrid <[EMAIL PROTECTED]> writes:
problem: international characters cause problems

  the image of victor jara in article is lost
  int. chars. in filename saved on local disk is garble
Wget saves exactly the characters it finds in the URL.  If the URL
contains the sequence (presumably UTF-8) %C3%AD, that is what Wget
will write to the file name.

What characters did you expect to find in local file names?

Re: wget and international characters (ascii > 127)

Reply via email to