Mauro Tortonesi <[EMAIL PROTECTED]> writes:

> On Sunday 20 February 2005 06:31 pm, Hrvoje Niksic wrote:
>> string_t.c uses the function iswblank, which doesn't seem to exist
>> on Solaris 8 I tried to compile it on.  (Compilation is likely
>> broken on other non-Linux platforms as well for the same reason.)
>> Since nothing seems to be using the routines from string_t, I
>> solved the problem by removing string_t.o from Makefile.
>
> i know, i still have to add autoconf tests for support of iconv(3)
> and wide chars.

Please don't do that.  Interface and behavior of iconv change tends to
vary greatly between systems and their versions.  Also, "wide
character" support is still missing or (worse) broken on many systems
and compilers.

There must be a way to implement the desired functionality without
resorting to the use of "wide characters" for every single string in
Wget.

>> If portability is still desired for Wget, it would IMHO be a good
>> idea to completely remove the dependency on wide characters.
>
> backwards compatibility towards old and legacy systems is and will
> always be a primary concern for wget,

I prefer to think of it as portability, not as catering to legacy
systems.  If you wish to remain portable, you simply have to give up
on the use of some features present on one particular system,
especially when they provide basic functionality, such as string
handling.

Note that having a lot of ifdefs doesn't automatically make a program
portable.  Portability is also the art of writing code that works on
other systems even when you *don't* have them available for testing.
To do that, you have to use only the API's you know are robustly
implemented on all the architectures you care about.  Wide chars
definitely don't fall into that category.

> however, i really think we need to support string escape when
> printing data coming from a possibly unsafe source (e.g. a server)
> to the console. please read this thread:

Why not simply escape such strings when printed?  What am I missing?

> when we are printing to a tty (try to) interpret all the strings
> coming from a possibly unsafe source according to the local charset
> (this involves a MBR to WIDE CHAR translation) escaping the
> unprintable chars, then store the escaped string using UTF8 encoding
> (which allows the escaped strings to be interpolated within the
> strings retrieved via gettext - which need to be UTF8 encoded as
> well).

I don't see why we need explicit translation to wide chars.
Interpolation of server-received string in Wget's messages has worked
since time immemorial, not only for Wget, but also for other programs
that happen to print strings they've read from the wire or from a file
(such as tar or ls or whatever).  The only problem I can think of are
the non-ASCII characters embedded in server messages, which can be
happily filtered out.

> please notice that by adopting this policy we will not be able to
> rely on I/O functions from the standard C library anymore. instead,
> we will have to develop our own output functions.

logprintf is supposed to be an interface to printf that also provides
(configurable) logging.  Creating our own version of snprintf just to
escape a few strings seems like a total overkill to me.

> else (if the current system does not support iconv(3) OR wide chars):
>
> when we are printing to a tty (try to) perform escaping of the
> strings coming from a possibly unsafe source according to the ASCII
> charset (that is, escape unprintable ASCII chars). no need to adopt
> UTF8 encoding or implement any special output functions.
>
> what do you think? any comments or questions?

I don't see why we couldn't always do this last thing you described.
Wget doesn't print many human-readable server messages the user cares
about anyway.  There are only three cases I can think of:

1) Redirection URL.  This is machine-readable anyway, and there are
   functions in Wget that will escape the non-printable chars with
   %xx sequences.

2) Server messages printed by Wget in normal operation, such as the
   "200 Ok" message.  That one is printed just for the "fun factor"
   anyway, we could as well print just the response code.  However, I
   don't see a problem with simply filtering out the non-ASCII's from
   the response code.  People who put non-ASCII messages in server
   response lines won't be able to see them properly in Wget's output,
   but I honestly couldn't care less.

3) Server headers printed by `wget -S'.  These are provided to inspect
   technical properties of the response, not to gain access to
   human-readable messages by server operators.  Therefore escaping
   non-ASCII characters in those messages loses us nothing.

Reply via email to