On 1/31/07, Steven M. Schweda <[EMAIL PROTECTED]> wrote:
From: "Leo Jay"

> > > since the responds of ftp server could be in different charsets, and
> > > wget can't cope with charsets other than English, i'd like to know is
> > > there any plan about supporting different charsets?
> >
> >    Are you complaining about dates in different languages, or file names
> > in different character sets?
>
> i'm talking about dates in different languages.
>
> i haven't tried file names in different charsets,
> but i'm sure wget can't cope with dates in different languages.

   If you look in src/ftp-ls.c: ftp_parse_unix_ls(), you should find an
array of month names:

        static const char *months[] = {
          "Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
        };

If by "dates in different languages" you mean that non-English month
names are the only problem, then it should be fairly easy to extend this
with month names in other languages, and then change the code below ("if
(i != 12)", "month = i;") to something a litle more complex, to handle
the new possibilities.

   If the order of the tokens also changes, then you may need to dive
into the hideously complex parsing code, and make it even more hideously
complex.  (The fellow who "designed" the date format(s) for "ls" was
obviously targeting an intelligent human audience, not another computer
program.  The order and simplicity of a VMS DIRECTORY listing shows some
evidence of actual design, and parsing such a listing is relatively
trivial, but that won't help you any.)

   I might offer a few more details, but your specification of the
problem is not complete enough to make that practical.  If you can list
a set of date forms which must be interpreted, then it might be possible
to say how hard it would be to do the job.  (I assume that there is no
actual ambiguity in the month name strings for the languages you would
like to support, but that could make the problem impossible to solve for
some languages.)


i had already hacked the src/ftp-ls.c to meet my need before i posted
this thread.
but my approach is just hard coding, which i think is not a good way
to solve this
problem and lack of flexibility. so, i wonder if the wget developers
have any plan to
solve this problem. and i think their solution must be very elegant
(at least than mine).

and the attachment is my modification for big5 charset.
could you please have a look at it for its correctness? thanks.

--
Best Regards,
Leo Jay

Attachment: big5.patch
Description: Binary data

Reply via email to