Glenn Linderman <[email protected]> added the comment:
Pierre said:
The encoding used by the browser is defined in the Content-Type meta tag, or
the content-type header ; if not, the default seems to vary for different
browsers. So it's definitely better to define it
The argument stream_encoding used in FieldStorage *must* be this encoding
I say:
I agree it is better to define it. I think you just said the same thing that
the page I linked to said, I might not have conveyed that correctly in my
paraphrasing. I assume you are talking about the charset of the Content-Type
of the form page itself, as served to the browser, as the browser, sadly,
doesn't send that charset back with the form data.
Pierre says:
But this raises another problem, when the CGI script has to print the data
received. The built-in print() function encodes the string with
sys.stdout.encoding, and this will fail if the string can't be encoded with it.
It is the case on my PC, where sys.stdout.encoding is cp1252 : it can't handle
Arabic or Chinese characters
I say:
I don't think there is any need to override print, especially not
builtins.print. It is still true that the HTTP data stream is and should be
treated as a binary stream. So the script author is responsible for creating
such a binary stream.
The FieldStorage class does not use the print method, so it seems inappropriate
to add a parameter to its constructor to create a print method that it doesn't
use.
For the convenience of CGI script authors, it would be nice if CGI provided
access to the output stream in a useful way... and I agree that because the
generation of an output page comes complete with its own encoding, that the
output stream encoding parameter should be separate from the stream_encoding
parameter required for FieldStorage.
A separate, new function or class for doing that seems appropriate, possibly
included in cgi.py, but not in FieldStorage. Message 125100 in this issue
describes a class IOMix that I wrote and use for such; codifying it by
including it in cgi.py would be fine by me... I've been using it quite
successfully for some months now.
The last line of Message 125100 may be true, perhaps a few more methods should
be added. However, print is not one of them. I think you'll be pleasantly
surprised to discover (as I was, after writing that line) that the
builtins.print converts its parameters to str, and writes to stdout, assuming
that stdout will do the appropriate encoding. The class IOMix will, in fact,
do that appropriate encoding (given an appropriate parameter to its
initialization. Perhaps for CGI, a convenience function could be added to
IOMix to include the last two code lines after IOMix in the prior message:
@staticmethod
def setup( encoding="UTF-8"):
sys.stdout = IOMix( sys.stdout, encoding )
sys.stderr = IOMix( sys.stderr, encoding )
Note that IOMix allows the users choice of output stream encoding, applies it
to both stdout and stderr, which both need it, and also allows the user to
generate binary directly (if sending back a file, for example), as both bytes
and str are accepted.
print can be used with a file= parameter in 3.x which your implementation
doesn't permit, and which could be used to write to other files by a CGI
script, so I really, really don't think we want to override builtins.print
without the file= parameter, and specifically tying it to stdout.
My message 126075 still needs to be included in your next patch.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue4953>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com