Re: [Web-SIG] Future of WSGI

2009-12-27 Thread Henry Precheur



On Tue, Nov 24, 2009 at 10:50:00PM +0100, Malthe Borch wrote:
 How people use or abuse software is not our concern; but the standard
 library should not itself abuse its own abstractions.

Your assumption is that `environ` == HTTP headers. That's simply NOT the
case. A request is:
  - A request line
  - Some headers
  - A body

(See http://tools.ietf.org/html/rfc2616#section-5)

The request body, the request method (GET, POST, ...), the request URL,
the HTTP version are all in `environ`.

If you really want to separate the headers from the rest you would put
another dictionary containing the headers inside `environ`. Instead WSGI
puts the headers prefixed with HTTP_ in `environ`, because that's what
CGI is doing. It might not be 100% clean, or logic, but it's SIMPLER,
there's no need to deal with nested dictionaries or other more complex
structure, and it's extensible.

   Request = namedtuple(Request, environ body)
   Response = namedtuple(Response, status headers iterable)
 
 Iterable might be body or chunks or some other term.

namedtuple is Python 2.6+: WSGI can't use it. WSGI must work w/ older
versions of Python.

-- 
  Henry Prêcheur


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [RFC] urllib2 requests history + HEAD support

2009-12-21 Thread Henry Precheur
On Sun, Dec 20, 2009 at 11:38:19PM +0530, Senthil Kumaran wrote:
 I need your opinion on this request. 
 http://bugs.python.org/issue1673007
 
 Python Standard Library module urllib2 has support GET and POST.
 There was a feature request to add support for HEAD requests.

It would be nice to have other methods too, like PUT  DELETE:

  http://tools.ietf.org/html/rfc2616#page-52

 While that is valid feature request, there was suggestion to include a
 history of the requests in the module.  I don't find any references in
 the RFCS for any such requirement to maintain a history of requests.
 
 Do you have any opinion on whether is it a good idea to have history
 of requests in the urllib2 module? I personally feel that history of
 requests can be easier tracked by the clients.

This should be done by the client.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Henry Precheur
On Thu, Dec 03, 2009 at 07:35:14PM +0100, And Clover wrote:
 I don't know what the HTTP/Cookie spec says about this.
 
 The traditional interpretation of RFC2616 is that headers are ISO-8859-1.
 
 You will notice that no browser correctly follows this.

The RFC 2109  2965 say that a cookie's value can be anything:

 The VALUE is opaque to the user agent and may be anything the origin
 server chooses to send, possibly in a server-selected printable ASCII
 encoding.

Theoricaly you could put something like: 'foo\n\0bar' in a cookie.

Also a cookie can include comments which have to be encoded using ...
UTF-8:

 Comment=value
   OPTIONAL.  Because cookies can be used to derive or store
   private information about a user, the value of the Comment
   attribute allows an origin server to document how it intends to
   use the cookie.  The user can inspect the information to decide
   whether to initiate or continue a session with this cookie.
   Characters in value MUST be in UTF-8 encoding.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HTTP headers encoding

2009-12-03 Thread Henry Precheur
On Thu, Dec 03, 2009 at 05:09:31PM +0100, Manlio Perillo wrote:
 This is really a mess.

RFC 2617 doesn't specify any encoding for its headers, so it should be
latin-1 everywhere. But on the web nobody respect standards.

 How is authorization username handled in common WSGI frameworks?

As far as I know, they don't handle this. They just return the string
without dealing with the encoding issues.

I think there is no correct way of handling this, because 99% of
username/password contain only ascii characters. A possible 'workaround'
would be to limit yourself to the ascii charset. If you get a non-ascii
character raise an Exception.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] HTTP headers encoding

2009-12-03 Thread Henry Precheur
On Thu, Dec 03, 2009 at 08:33:19PM +0100, Manlio Perillo wrote:
 Right now I'm doing a: username.decode('us-ascii', 'replace')

Or like most frameworks you could let the application author deal with
the problem, just pass the raw strings to the application.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

2009-12-03 Thread Henry Precheur
On Thu, Dec 03, 2009 at 09:15:06PM +0100, Manlio Perillo wrote:
 There is something that I don't understand.
 
 Some HTTP headers, like Accept-Language, contains data described as
 `token`, where:
 
 token  = 1*any CHAR except CTLs or separators
 
 So a token, IMHO, is an opaque string, and it SHOULD not decoded.
 In Python 3.x it SHOULD be a byte string.

I think this is more an issue that frameworks should deal with. By
decoding every headers value to latin-1:

* It keeps WSGI simple. Simple is good.

* WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
  says. WSGI is about HTTP, but that doesn't necessarily includes all
  other standards extending HTTP.

* It's possible to convert latin-1 strings to bytes without losing data.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-24 Thread Henry Precheur
On Tue, Nov 24, 2009 at 11:36:57PM +0100, Malthe Borch wrote:
 2009/11/24 Henry Precheur he...@precheur.org:
  (See http://tools.ietf.org/html/rfc2616#section-5)
 
  The request body, the request method (GET, POST, ...), the request URL,
  the HTTP version are all in `environ`.
 
 That reference does not mention the environment. It's not an official
 term.

Are you talking about PEP-333 or RFC 2616?

  namedtuple is Python 2.6+: WSGI can't use it. WSGI must work w/ older
  versions of Python.
 
 It was meant as illustration, but sure.

Then what? Your proposal doesn't work. So let's forget about it and
stick to dict?

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Future of WSGI

2009-11-24 Thread Henry Precheur
On Tue, Nov 24, 2009 at 11:16:05PM +0100, Sylvain Hellegouarch wrote:
 Though it shouldn't be considered as a problem, the fact that probably 
 no existing framework actually use the raw dictionary (there is, in 
 almost all cases, a wrapping into a friendlier object), one might wonder 
 why keeping such a low level interface rather than directly provide a 
 higher level interface is a good idea. After all creating those 
 dictionaries for no good reason aside from sending them to the next 
 layer which will map them into a WebOb, a yaro, a cherrypy request, or 
 zope request, etc. seems slightly pointless

1. Would you say that POSIX is useless because there are lots of
   libraries and applications build on top of it? Why not implement
   those libraries and applications directly without using POSIX?

2. Guess what: WebOb, Werkzeug, Yaro, Django, CherryPy, and the others
   have a different interfaces for their Request/Response objects.
   Because for Request/Response there's hardly one-size fits all.
   There's certainly some common ground, but every framework has
   different needs.

 (I'm not versed into Python internals, but doesn't it have also a cost
 of creating rather useless objects repeatedly like that?)

The dictionary is passed as a reference like every Python objects. So it
doesn't cost anything to use it instead of an object.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Framework

2009-10-05 Thread Henry Precheur
On Sun, May 31, 2009 at 09:30:26AM -0700, Omar Munk wrote:
- A good documentation.
- Not to overkill like Django
- Easy and simple
- Just something like PHP but without the dirty style.
- I like Karrigell but it looks like it's dead do you know a clone of it?
- Not need a VPS to host it, just a server that has Python.

I would still recommend Django. I think it's the best web-framework if
you are beginning. It's not like PHP, but I don't know of anything like
PHP in Python.

 And creating your own is that hard?

Yes, it's hard, especially if you are new to web development.

Cheers,

-- 
  Henry Pr?cheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Henry Precheur
On Tue, Sep 22, 2009 at 11:26:15PM -0400, P.J. Eby wrote:
 +1, if you mean the strings have the same content, 
 character-for-character on Python 2.3.  That is, a \x80 byte in a 
 Python 2 'str' is matched by an \x80 character in the Python 3 
 'str'.  (I presume that's what we mean by native, but I want to be sure.)

It is the case (Python 3 code):

 ord(b'\x80'.decode('latin1')) == b'\x80'[0]
True

Also I'd like to point out that the Cookie problem could be more
general than we think. HTTP_COOKIE is the only header we have identified
so far with a weird encoding scheme. But I am pretty sure some idiots
have or will create other weird headers with strange encoding scheme
--let's mix UTF-8  latin1 just for the fun of it.

By defaulting to latin-1 it will ensure that WSGI is solid enough to
face these weird situations.

I stronly backs the use of a single encoding. The proposed
wsgi.uri_encoding method doesn't seem to add anything compared to
latin-1.


Ian's proposal seems to be fairly complete and address all the issue we
had, with the exception of the outstanding issues he pointed out at the
end of his mail.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

2009-09-22 Thread Henry Precheur
On Tue, Sep 22, 2009 at 09:22:48PM -0500, Ian Bicking wrote:
 Well, the biggie: is it right to use native strings for the environ values,
 and response status/headers?  Specifically, tricks like the latin1
 transcoding won't work in Python 2, but will in Python 3.  Is this weird?
 Or just something you have to think about when using the two Python
 versions?

I don't have the whole discussion in mind. But except 'using unicode
everywhere', I don't think there's a single proposal that would allow
people to keep to same 'logic' in both Python 2  3.

Using bytes in Python 3 requires you to have 2 different 'logic' for
Python 2 and 3, because of the limitation of bytes which can't do all
what str can do and the stdlib's problems with bytes.

Using str in Python 3 requires you to have 2 different 'logic' too.
Because Python 3's str are not Python 2's str.

(Just to make things clear the term 'logic' refers to transcoding of
strings into the correct encoding)

 What happens if you give unicode text in the response headers that cannot be
 encoded as Latin1?

We can ignore the header. But if a response header contains non-Latin-1
characters, it's not WSGI compliant, I would therefor expect an error.

To cite The Zen of Python:

Errors should never pass silently.

 Should some things be unicode on Python 2?

No. I think it's more important to keep WSGI simple. Let's use str
everywhere. Frameworks can always transcode what should be Unicode,
that's their job.
 
 Is there a common case here that would be inefficient?

Transcoding every strings from Latin-1 to Unicode could be time
consuming. The only way I see to make things faster is to use bytes
everywhere, but that's not possible given the previous discussions.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Henry Precheur
On Mon, Sep 21, 2009 at 11:09:24AM -0500, Ian Bicking wrote:
 I think surrogateescape can resolve the small handful of problems.

+1

surrogateescape would be a great alternative to the try utf-8 then
latin-1 approach. It would simplify the gateway and the application. No
need to check some 'encoding' variable and transcode later. We just
encode everything to UTF-8, no special case.

surrogateescape isn't implemented (yet?) for Python 2. That's not an
issue if the 'new' WSGI sticks to native strings.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Henry Precheur
On Mon, Sep 21, 2009 at 09:14:13PM +0200, Armin Ronacher wrote:
 So the same standard should have different behavior on different Python
 versions?  That would make framework code a lot more complicated.

I don't understand why it would be 'a lot more' complicated.

(The following code snippets is Python 3 only, and assumes we're using
'native strings' everywhere)

In the gateway, environ would be populated this way:

  environ['some_key'] = some_value.decode('utf8', 'surrogateescape')

Compare that to the utf-8-then-latin-1 alternative:

  try:
  environ['some_key'] = some_value.decode('utf-8')
  environ['some_key.encoding'] = 'utf-8'
  except UnicodeError:
  environ['some_key'] = some_value.decode('latin-1')
  environ['some_key.encoding'] = 'latin-1'


What you would have in the application to get the original value:

  environ['some_key'].encode('utf8', 'surrogateescape')

With utf8-then-latin1:

  environ['some_key'].encode(environ['some_key.encoding'])


The 'surrogateescape' way is clearly simpler. The 'equivalent' Python 2
code is even simpler:

  environ['some_key'] = some_value

And:

  environ['some_key']


-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Henry Precheur
On Mon, Sep 21, 2009 at 03:26:35PM -0700, Robert Brewer wrote:
 It looks simpler until you have a site that is not primarily utf-8. In
 that case, you multiply your (1 line * number of middlewares in the WSGI
 stack * each request).
 With wsgi.uri_encoding you get either (1 line * 1
 middleware designed to transcode * each request), or even 0 if your
 whole site uses just one charset.

I am not sure I understand your point.

The 0 lines hold true if the whole site is using latin-1 or utf-8 and
you write your applications/middlewares only for this site. But if it's
using any other encoding you still have to transcode.

def middleware(start_response, environ):
value = environ['some_key'].\
encode('utf8', 'surrogateescape').\
decode(SITE_ENCODING)
...

With wsgi.uri_encoding you would still have to do the following:

def middleware(start_response, environ):
value = environ['some_key'].\
encode(environ['some_key.encoding']).\
decode(SITE_ENCODING)
...

Of course you can directly use `environ['some_key']` if you know you'll
get the 'right' encoding all the time. But when the encoding changes,
you'll have to fix all your middlewares.


I am missing something?

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-21 Thread Henry Precheur
On Mon, Sep 21, 2009 at 07:40:54PM -0700, Robert Brewer wrote:
 The decoding doesn't change spontaneously.
 You either get the correct one or you get an incorrect one. If it's
 incorrect, you fix it, one time, via a WSGI component which you've
 configured to determine the correct decoding. Then every other WSGI
 component below that one can go back to trusting the decoding was
 correct. In fact, if you do that transcoding right away, no other WSGI
 components need to be rewritten to take advantage of unicode. You just
 have to deploy a single transcoder, that's 6 lines of code max.

And you can do that with utf8+surrogateescape too. Except that you don't
have to determine what encoding the gateway sent you, it's always
utf8+surrogateescape.

 With utf8+surrogateescape, you don't transcode once, you transcode in
 every WSGI component in your stack that needs to correct the
 decoding. You have to do it more than once because, each time you
 encode/re-decode, you use the result and then throw it away. Any
 subsequent WSGI components have to encode/re-decode--you cannot store
 the redecoded URI in SCRIPT_NAME/PATH_INFO, because the
 utf8+surrogateescape scheme says...well, it's always utf8-decoded.

You don't get something REALLY important with surrogateescape: You can
ALWAYS get the original bytes back.

 b = b'fran\xe7cois'
 s = b.decode('utf8', 'surrogateescape')
 s
'fran\udce7cois'
 s.encode('utf8', 'surrogateescape')
b'fran\xe7cois'

See? I got my latin-1 character '\xe7' back! Because '\udce7' is not a
normal UTF-8 character, this character use some 'free space' in the
unicode supplementary characters.

The only thing you have to do is to pass 'surrogateescape' each time you
call encode/decode.

 In addition, *every* component that needs to compare URI's then has to
 be configured with the same logic, however convoluted, to perform the
 correct decoding again. It's not just routing middleware: caches
 need to reliably compare decoded URI's; so do sessions; so does auth
 (especially!); so do static files. And Heaven forfend you actually
 decode differently in two different components!

I don't understand why I would need to throw away the decoded string.

This works perfectly well a far as I know:

environ['PATH_INFO'] = environ['PATH_INFO'].\
  encode('utf8', 'surrogateescape').\
  decode(SITE_ENCODING)

utf8+surrogateescape provides the same possibilities as
wsgi.uri_encoding. You can transcode without losing information when you
know what the correct encoding is. But utf8+surrogateescape is simpler
because there's no need to pass around the name of the encoding in an
additional variable.

-- 
  Henry Prêcheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2: Decoding the Request-URI

2009-08-20 Thread Henry Precheur
On Sun, Aug 16, 2009 at 08:06:03PM -0700, Robert Brewer wrote:
 However, we quite often use only a portion of the URI when attempting
 to locate an appropriate handler; sometimes just the leading /
 character! The remaining characters are often passed as function
 arguments to the handler, or stuck in some parameter list/dict. In
 many cases, the charset used to decode these values either: is
 unimportant; follows complex rules from one resource to another; or is
 merely reencoded, since the application really does care about bytes
 and not characters. Falling back to ISO-8859-1 (and minting a new WSGI
 environ entry to declare the charset which was used to decode) can
 handle all of these cases. Server configuration options cannot, at
 least not without their specification becoming unwieldy.

(Just to make things clear, I am not just talking about REQUEST_URI
here, but all request headers)


Encoding everything using ISO-8859-1 has the nice property of keeping
informations intact. It would be good heuristic if everything with a few
exceptions was encoded using ISO-8859-1. Just transcode the few
problematic cases at the application level and everybody is happy. A
string encoded from ISO-8859-1 is like a bytes object with a string
'interface' on top of it.


But it sweep the encoding problem under the carpet. The problem with
Python 2 was that str and unicode were almost the same, so much the same
that it was possible to mix them without too much problems:

   'foo' == u'foo'
  True

Python 3 made bytes and string 'incompatible' to force programmers to
handle the encoding problem as soon as possible:

   b'foo' == 'foo'
  False

By passing `str()` to the application, the application author could
believe that the encoding problem has been handled. But in most cases it
hasn't been handled at all. The application author should still
transcode all the strings incorrectly encoded. We are back to Python 2's
bad old days, where we can't be sure that what we got is properly
encoded:

  Was that string encoded using latin-1? Maybe a middleware transcoded
  it to UTF-8 before the application was called. Maybe the application
  itself transcoded it at some point, but then we need to keep track of
  what was transcoded. Maybe the application should transcode everything
  when it is called.

Also EVERY application author will have to read the PEP, especially the
paragraph saying:

   Everything we give you are strings, but you still have to deal
   with the encoding mess.

Otherwise he will have weird problems like when he was using Python 2.
Because the interface is not clear. strings are supposed to be text and
only text. Encoding everything to ISO-8859-1 means strings are not text
anymore, they are 'encoded data' [1].


bytes are supposed to be 'encoded data' and binary blobs. By giving
applications bytes, the author knows right away he should decode them.
No need to read the PEP.


`bytes` can do everything `str` can do with the notable exception of
'format'.

   b'foo bar'.title()
  b'Foo Bar'

   b'/foo/bar/fran\xc3ois'.split(b'/')
  [b'', b'foo', b'bar', b'fran\xc3ois']

   re.match(br'/bar/(\w+)/(\d+)', b'/bar/foo/1234').groups()
  (b'foo', b'1234')

I understand that `bytes()` is an unfamiliar beast. But I believe the
encoding problem is the realm of the application, not the realm of the
gateway. Let the application handle the encoding problem and don't give
it a half baked solution.


Using bytes also has its set of problems. The standard library doesn't
support bytes very well. For example urllib.response.unquote() doesn't
work with bytes, and urllib.parse too has issues.

[1] 
http://docs.python.org/3.1/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

-- 
  Henry Pr?cheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 2

2009-08-13 Thread Henry Precheur
On Wed, Aug 12, 2009 at 12:05:40AM -0500, Ian Bicking wrote:
 Correct -- you can write any set of % encodings, and I don't think it even
 has to be able to validly url-decode (e.g., /foo%zzz will work).  It
 definitely doesn't have to be a valid encoding.  However, if you actually
 include unicode characters, they will always be encoded as UTF-8 (as goes
 with the IRI standard).  This is in a case like a href=/some page, the
 browser will request /some%20page, because it escapes unsafe characters.
  Similarly if you request a href=/fran??ais it will encode that ?? in
 UTF-8, then url-encode it, even if the page itself is ISO-8859-1.  Well, at
 least on Firefox.  I used this to test:
 http://svn.colorstudy.com/home/ianb/wsgi-unicode-test.py

I have run some tests regarding the encoding issue:

curl doesn't 'url-encode' its URLs:

  curl 'http://hostname/fran?ais'
^
 e7 latin-1 character

The latin-1 character is send to the server. Lighttpd accepts the URL
and even return a file if it exists. Of course if I try with the same
characters in UTF-8 it doesn't work.

AFAIK RFC 2396 forbid non-ASCII characters in URLs. The problem is that
libcurl is quite popular (it used to be the transport library of
Webkit/GTK+ for example.) It's hard to discard it as a utterly broken 
obscure tool. Many 'simplistic' HTTP clients may have the same problem.


Now let's talk a little bit about cookies...

Cookies can contain whatever 'binary junk' the server send. RFC 2965
says (http://tools.ietf.org/html/rfc2965#page-5):

 The VALUE is opaque to the user agent and may be anything the origin
 server chooses to send, possibly in a server-selected printable ASCII
 encoding.

Also, cookies can contain 'comments' which contains UTF-8 strings.
(http://tools.ietf.org/html/rfc2965#page-6):

 Characters in value MUST be in UTF-8 encoding.

Firefox has no problem with cookies containing non-ASCII characters. It
looks like it assumes cookies are encoded using latin-1, since latin-1
characters are displayed correctly in Firebug, but not UTF-8 ones.


Cheers,

-- 
  Henry Pr?cheur
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com