[Repoze-dev] [issue100] Configurable character set support for repoze.who and repoze.what
Yuen Ho Wong wyue...@gmail.com added the comment: I'm not entirely sure why this is not actionable on any realistic level, and what you mean by lower layers. Perhaps you can explain further or point me to a previous discussion if this has been talked about before? -- status: resolved - chatting __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue100 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev
[Repoze-dev] [issue100] Configurable character set support for repoze.who and repoze.what
Yuen Ho Wong wyue...@gmail.com added the comment: Well I think the WSGI 1.x spec has made a mistake of mandating all strings in environ to be byte strings while not defining a global environment variable to give middlewares a hint of how to decode the byte strings. This is a recognized problem that is address in WSGI 2 by mandating strings to be unicode. The problem with not knowing how to decode byte strings is not not knowing how to decode in the handlers but how to decode in the middlewares, which is supposed to be application agnostic. To limit this problem to just repoze.who, say I have an IIdentifier that wants to remember credentials according to different charsets on a per request basis. In a perfect world, the server will set a well known variable in the request on the first opportunity, and the plugin will just look for it and encode accordingly. But in WSGI, there's no request object, there's only an environ, so we are stuck with that. So in this less perfect world, there would be a well-known charset variable in the environ to give hints to middlewares and the applications. But there isn't, so we application developers have to invent one. Right now, every framework deals with it differently, but at the end of the day, there is a threadlocal charset variable that __handlers__ can use. There is no equivalent in repoze.who and repoze.what. As I have already said, until Py3k takes off and we are all using WSGI 2, this will be a problem we are stuck with and middlewares will need to deal with it. I have already proposed 3 solutions in comment #1. I'm in favor of solution number 2. To answer your questions. Yes, just having a charset for repoze.who will not solve all the problems of decoding in WSGI apps, but at least it's half of the solution. I believe the repoze.who middleware should take a parameter in the constructor, such that it can set it into the environ as early as possible. To plugins, this serves as a hint - meaning only a default - charset to decode bytestrings. There's no compliance forced on plugins, but it only serves as a very helpful clue. Holistic support simply means 1) have a globally visible charset variable for all of repoze.who at any scope, 2) all the plugins will make a best effort to decode according to the global charset. I hope this clears up the issue. __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue100 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev
[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types
Yuen Ho Wong wyue...@gmail.com added the comment: Ok I agree it's not important to try to emulate mod_auth_tkt, which I think is a totally dead project anyway. But I have to take issue with the way unicode decoding is dealt with in this plugin. First of all, you are not even suppose to smuggle and unicode in the environ, it's just not allowed in the WSGI spec. Breaking assumptions is almost always bad since it makes debugging so much harder without having to read the source code. I think the larger issue here is how repoze.who and repoze.what deals with unicode as a whole. __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue101 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev
[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types
Yuen Ho Wong wyue...@gmail.com added the comment: Yes the charset is irrelevant here. Decoding shouldn't be done here anyway. I think I have to reiterate the problem as accepting unicode strings because it breaks conformance with the WSGI spec. There never should have been unicode strings in the environ in the first place, so never mind transparently handling it without throwing any errors. As to the problem of using eval(), I can see one can still be paranoid. Perhaps, simply prepending type , and a whitespace character like \n as a delimiter will take care of the problem? When you parse on ingress, just split by \n. __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue101 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev
[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types
Yuen Ho Wong wyue...@gmail.com added the comment: P.S. I think this solution solves the uncertainty of possibly clashing with the mod_auth_tkt use of the userdata field, however small this (non) issue maybe? __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue101 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev
[Repoze-dev] [issue100] Configurable character set support for repoze.who and repoze.what
New submission from Yuen Ho Wong wyue...@gmail.com: It seems that from the design to the implementation of repoze.who and repoze.what, including its plugins, have completely failed to support holistically any charset other then Latin-1. As of now, there is not an option global to repoze.who, and subsequently repoze.what, to indicate to the plugins as to what charset to decode the HTTP params to. This can cause serious problems for websites written in exotic charsets like Big-5. Even for websites written in UTF-8, there is still not an option to guarantee the plugins the correct charset codec. I was looking at the source code for repoze.who, AuthTktCookie, *Forms, and repoze.who.plugins.sa, and it seems that putting a `charset` variable into the `environ` is the best way to go for reasons of global visibility (makes me want to lobby the WSGI people to put this in WSGI 1.1 and 2). The problem with this is I'm not sure if there's any frameworks/servers out there that already do this. Another way to do it is to name the variable `repoze.who.charset`. Yet another was is to have plugins look for a key `charset` from the identity, which localize this variable's visibility to just repoze.who and repoze.what plugins. -- assignedto: chrism messages: 273 nosy: chrism, wyuenho priority: bug status: unread title: Configurable character set support for repoze.who and repoze.what __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue100 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev
[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types
New submission from Yuen Ho Wong wyue...@gmail.com: This is a minor bug remotely related to issue 100. According to the WSGI spec, all the strings in the environ should be byte strings. However, if some IIdentifier plugin returns an unicode login name for identity dict, AuthTktCookie will automatically encoding the value into UTF-8, and rewrite the `user_data` field of the auth_tkt to embed the type info. This is doubly bad besides deviating from the spec. According to the mod_auth_tkt README, the user_data field has special meaning to the Apache module and should not be rewritten. IMHO the proper fix is to have the plugin to look for a charset value that's in scope when something needs to be decoded/encoded. http://www.python.org/dev/peps/pep-0333/#unicode-issues -- messages: 274 nosy: wyuenho priority: bug status: unread title: AuthTktCookie should not try to decode userid based on value types topic: repoze.who __ Repoze Bugs b...@bugs.repoze.org http://bugs.repoze.org/issue101 __ ___ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev