Walter Dörwald wrote:
The register command in 2.4 (and current CVS) simply does a
   value = str(value)
in post_to_server() so the encoded bytes sent depend on the
default encoding. Would it be sufficient to change this to
   value = unicode(value).encode("utf-8")

Indeed. I think this can go into 2.4.2.

Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both:
Always use UTF-8 as the encoding and include this in the Content-type
header in the request. PyPI should honor this encoding when it finds
it and should fall back to whatever it used before if it doesn't.

Yeah, well :-) Content-type in form upload is a mess, as you certainly know. It should be honored, but commonly isn't. This, in turn, causes browsers to ignore it.

PyPI uses the CGI module. It currently decodes anything that doesn't
have a filename attribute to UTF-8, causing rejection of anything
that doesn't send UTF-8. This could be fixed/extended, but I think that
would be best done in the CGI module, for consumption by any application
that uses form upload. For example, doing

cgi.FieldStorage(..., encoding="UTF-8")

should cause

a) decoding of every field that has an encoding= in its content
   type
b) decoding of every field that is not a file to UTF-8. It is a
   file if it
   I) has a filename, or
   II) cannot be decoded to the target decoding

For backwards compatibility, a) can only be enabled if the CGI
application explicitly tells what encoding it expects.

I'd like to state "contributions are welcome", although others
may think differently.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to