On 1/31/07, Steven M. Schweda <[EMAIL PROTECTED]> wrote:
From: "Leo Jay"> > > since the responds of ftp server could be in different charsets, and > > > wget can't cope with charsets other than English, i'd like to know is > > > there any plan about supporting different charsets? > > > > Are you complaining about dates in different languages, or file names > > in different character sets? > > i'm talking about dates in different languages. > > i haven't tried file names in different charsets, > but i'm sure wget can't cope with dates in different languages. If you look in src/ftp-ls.c: ftp_parse_unix_ls(), you should find an array of month names: static const char *months[] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" }; If by "dates in different languages" you mean that non-English month names are the only problem, then it should be fairly easy to extend this with month names in other languages, and then change the code below ("if (i != 12)", "month = i;") to something a litle more complex, to handle the new possibilities. If the order of the tokens also changes, then you may need to dive into the hideously complex parsing code, and make it even more hideously complex. (The fellow who "designed" the date format(s) for "ls" was obviously targeting an intelligent human audience, not another computer program. The order and simplicity of a VMS DIRECTORY listing shows some evidence of actual design, and parsing such a listing is relatively trivial, but that won't help you any.) I might offer a few more details, but your specification of the problem is not complete enough to make that practical. If you can list a set of date forms which must be interpreted, then it might be possible to say how hard it would be to do the job. (I assume that there is no actual ambiguity in the month name strings for the languages you would like to support, but that could make the problem impossible to solve for some languages.)
i had already hacked the src/ftp-ls.c to meet my need before i posted this thread. but my approach is just hard coding, which i think is not a good way to solve this problem and lack of flexibility. so, i wonder if the wget developers have any plan to solve this problem. and i think their solution must be very elegant (at least than mine). and the attachment is my modification for big5 charset. could you please have a look at it for its correctness? thanks. -- Best Regards, Leo Jay
big5.patch
Description: Binary data