Neil, thanks for revisiting Quixote's handling of Unicode. It's a
significant improvement that Quixote will now accept str objects as
is. However I think there may still be some i's to be dotted! Consider
the following bit of code in your revised _encode_chunk:

        if isinstance(chunk, unicode):
            if self.charset is None:
                # iso-8859-1 is the default for the HTTP protocol if charset
                # parameter of content-type header is not provided
                chunk = chunk.encode('iso-8859-1')

Remember that the question of ISO-8859-1 being the default for HTTP
doesn't come into this - the HTTP standard says nothing about how a
sender should decide which encoding to use. When charset has not been
specified, Quixote is in fact faced with two different questions with
respect to str and unicode objects. For character data in str objects
Quixote is trying to guess which encoding has *already* been used for
the bytestream; ISO-8859-1 is a reasonable assumption, although a case
could be made for sys.getdefaultencoding(). For unicode objects
Quixote is deciding which encoding it *will* choose to use; ISO-8859-1
is a poor choice as it will be unable to encode the majority of
Unicode character points; instead UTF-8 is a much more natural choice
for unicode objects.


Hamish
_______________________________________________
Quixote-users mailing list
[email protected]
http://mail.mems-exchange.org/mailman/listinfo/quixote-users

Reply via email to