On Monday 21 February 2005 02:39 pm, you wrote:
> Mauro Tortonesi <[EMAIL PROTECTED]> writes:
> > the problem is not with HTTP response messages, but with HTTP
> > resources (which can be for example binary data or multibyte char
> > text - in this case you really want to escape unprintable data while
> > printing all the valid multibyte chars you can using current locale
> > if the user has for instance specified -O -) and eventually with FTP
> > response messages.
>
> It is incorrect to just assume that the data we receive is textual.
> It will cause corruption with something as simple as `wget -O - URL >
> foo.gif'.
mmmh, i don't think yours is a good example:
[EMAIL PROTECTED] code]$ cat test.c
#include <stdio.h>
#include <unistd.h>
int main()
{
puts((isatty(STDOUT_FILENO) ? "1" : "0"));
return 0;
}
[EMAIL PROTECTED] code]$ gcc -o test test.c
[EMAIL PROTECTED] code]$ ./test | sort
0
[EMAIL PROTECTED] code]$ ./test > pippo.txt
[EMAIL PROTECTED] code]$ cat pippo.txt
0
[EMAIL PROTECTED] code]$ ./test
1
so by simply using isatty() we won't experience this kind of corruption.
> I still think that moving in this direction is a seriously bad move.
ok, there's no problem in reverting the changes i've made into CVS. i18n is
simply a nightmare and it's not a whole lot of fun working on it.
but i suspect we wiil probably have to add foreign charset support to wget
one of these days. for example, suppose we are doing a recursive HTTP
retrieval and the HTML pages we retrieve are not encoded in ASCII but in
UTF16 (an encoding in which is perfectly fine to have null bytes in the
stream). what do we do in that situation?
--
Aequam memento rebus in arduis servare mentem...
Mauro Tortonesi
University of Ferrara - Dept. of Eng. http://www.ing.unife.it
Institute of Human & Machine Cognition http://www.ihmc.us
Deep Space 6 - IPv6 for Linux http://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it