On Sunday 20 February 2005 06:31 pm, Hrvoje Niksic wrote: > string_t.c uses the function iswblank, which doesn't seem to exist on > Solaris 8 I tried to compile it on. (Compilation is likely broken on > other non-Linux platforms as well for the same reason.) Since nothing > seems to be using the routines from string_t, I solved the problem by > removing string_t.o from Makefile.
i know, i still have to add autoconf tests for support of iconv(3) and wide chars. > If portability is still desired for Wget, it would IMHO be a good idea > to completely remove the dependency on wide characters. backwards compatibility towards old and legacy systems is and will always be a primary concern for wget, at least as long as i am the maintainer. however, i really think we need to support string escape when printing data coming from a possibly unsafe source (e.g. a server) to the console. please read this thread: http://www.mail-archive.com/wget%40sunsite.dk/msg06953.html simone piunno (included in cc) and i have been thinking to adopt the following behaviour: if the current system supports iconv(3) AND wide chars: when we are printing to a tty (try to) interpret all the strings coming from a possibly unsafe source according to the local charset (this involves a MBR to WIDE CHAR translation) escaping the unprintable chars, then store the escaped string using UTF8 encoding (which allows the escaped strings to be interpolated within the strings retrieved via gettext - which need to be UTF8 encoded as well). the adoption of UTF8 as an internal encoding for wget strings forces us to perform decoding from UTF8 every time we print the strings. please notice that by adopting this policy we will not be able to rely on I/O functions from the standard C library anymore. instead, we will have to develop our own output functions. this is not so bad as it seems, since IIRC wget uses only the logprintf function to print output on the screen or in a log file. i've taken a deep look at the logprintf calls wget makes: http://www.mail-archive.com/wget%40sunsite.dk/msg06977.html and as you can see the only formats wget uses are: '%5ld', '%%', '%d', '%ld', '%2d', '%.2f', '%3d', '%c', '%s', '%*s' so, i was working on a simplified version of the dopr function contained in the snprintf module to be used by a new version of logprintf (this function would support only the above mentioned formats and escape a given string when using a special format - i used %es), but my laptop broke (i accidentall poured some water on it and the HDD is unrecoverable) before i could commit the code to CVS. else (if the current system does not support iconv(3) OR wide chars): when we are printing to a tty (try to) perform escaping of the strings coming from a possibly unsafe source according to the ASCII charset (that is, escape unprintable ASCII chars). no need to adopt UTF8 encoding or implement any special output functions. what do you think? any comments or questions? P.S. i am very sorry if you haven't heard any news from me lately but it seems that i've catched a very bad flue that when gone away keeps coming back. it's almost 4 weeks that i feel way too sick to work seriously on wget. i am very sorry. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng. http://www.ing.unife.it Institute of Human & Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linux http://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
