On Sunday 20 February 2005 06:31 pm, Hrvoje Niksic wrote:
> string_t.c uses the function iswblank, which doesn't seem to exist on
> Solaris 8 I tried to compile it on.  (Compilation is likely broken on
> other non-Linux platforms as well for the same reason.)  Since nothing
> seems to be using the routines from string_t, I solved the problem by
> removing string_t.o from Makefile.

i know, i still have to add autoconf tests for support of iconv(3) and wide 
chars.

> If portability is still desired for Wget, it would IMHO be a good idea
> to completely remove the dependency on wide characters.

backwards compatibility towards old and legacy systems is and will always be a 
primary concern for wget, at least as long as i am the maintainer.

however, i really think we need to support string escape when printing data 
coming from a possibly unsafe source (e.g. a server) to the console. please 
read this thread:

http://www.mail-archive.com/wget%40sunsite.dk/msg06953.html

simone piunno (included in cc) and i have been thinking to adopt the following 
behaviour:


if the current system supports iconv(3) AND wide chars:

when we are printing to a tty (try to) interpret all the strings coming from a 
possibly unsafe source according to the local charset (this involves a MBR to 
WIDE CHAR translation) escaping the unprintable chars, then store the escaped 
string using UTF8 encoding (which allows the escaped strings to be 
interpolated within the strings retrieved via gettext - which need to be UTF8 
encoded as well). the adoption of UTF8 as an internal encoding for wget 
strings forces us to perform decoding from UTF8 every time we print the 
strings.

please notice that by adopting this policy we will not be able to rely on I/O 
functions from the standard C library anymore. instead, we will have to 
develop our own output functions. this is not so bad as it seems, since IIRC 
wget uses only the logprintf function to print output on the screen or in a 
log file. i've taken a deep look at the logprintf calls wget makes:

http://www.mail-archive.com/wget%40sunsite.dk/msg06977.html

and as you can see the only formats wget uses are:

'%5ld', '%%', '%d', '%ld', '%2d', '%.2f', '%3d', '%c', '%s', '%*s' 

so, i was working on a simplified version of the dopr function contained in 
the snprintf module to be used by a new version of logprintf (this function 
would support only the above mentioned formats and escape a given string when 
using a special format - i used %es), but my laptop broke (i accidentall 
poured some water on it and the HDD is unrecoverable) before i could commit 
the code to CVS.


else (if the current system does not support iconv(3) OR wide chars):

when we are printing to a tty (try to) perform escaping of the strings coming 
from a possibly unsafe source according to the ASCII charset (that is, escape 
unprintable ASCII chars). no need to adopt UTF8 encoding or implement any 
special output functions.


what do you think? any comments or questions?


P.S. i am very sorry if you haven't heard any news from me lately but it seems 
that i've catched a very bad flue that when gone away keeps coming back. it's 
almost 4 weeks that i feel way too sick to work seriously on wget. i am very 
sorry.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.    http://www.ing.unife.it
Institute of Human & Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linux            http://www.deepspace6.net
Ferrara Linux User Group                 http://www.ferrara.linux.it

Reply via email to