[Repoze-dev] [issue100] Configurable character set support for repoze.who and repoze.what

2009-10-24 Thread Yuen Ho Wong

Yuen Ho Wong wyue...@gmail.com added the comment:

I'm not entirely sure why this is not actionable on any realistic level, and 
what you mean by 
lower layers. Perhaps you can explain further or point me to a previous 
discussion if this has been 
talked about before?

--
status: resolved - chatting

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue100
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] [issue100] Configurable character set support for repoze.who and repoze.what

2009-10-23 Thread Yuen Ho Wong

Yuen Ho Wong wyue...@gmail.com added the comment:

Well I think the WSGI 1.x spec has made a mistake of mandating all strings in 
environ to be 
byte strings while not defining a global environment variable to give 
middlewares a hint of how 
to decode the byte strings. This is a recognized problem that is address in 
WSGI 2 by 
mandating strings to be unicode.

The problem with not knowing how to decode byte strings is not not knowing how 
to decode in 
the handlers but how to decode in the middlewares, which is supposed to be 
application 
agnostic.

To limit this problem to just repoze.who, say I have an IIdentifier that wants 
to remember 
credentials according to different charsets on a per request basis. In a 
perfect world, the server 
will set a well known variable in the request on the first opportunity, and the 
plugin will just look 
for it and encode accordingly. But in WSGI, there's no request object, there's 
only an environ, 
so we are stuck with that. So in this less perfect world, there would be a 
well-known charset 
variable in the environ to give hints to middlewares and the applications. But 
there isn't, so we 
application developers have to invent one. Right now, every framework deals 
with it differently, 
but at the end of the day, there is a threadlocal charset variable that 
__handlers__ can use. 
There is no equivalent in repoze.who and repoze.what.

As I have already said, until Py3k takes off and we are all using WSGI 2, this 
will be a problem 
we are stuck with and middlewares will need to deal with it. I have already 
proposed 3 
solutions in comment #1. I'm in favor of solution number 2.

To answer your questions. Yes, just having a charset for repoze.who will not 
solve all the 
problems of decoding in WSGI apps, but at least it's half of the solution. I 
believe the 
repoze.who middleware should take a parameter in the constructor, such that it 
can set it into 
the environ as early as possible. To plugins, this serves as a hint - meaning 
only a default - 
charset to decode bytestrings. There's no compliance forced on plugins, but it 
only serves as a  
very helpful clue. Holistic support simply means 1) have a globally visible 
charset variable for 
all of repoze.who at any scope, 2) all the plugins will make a best effort to 
decode according to 
the global charset.

I hope this clears up the issue.

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue100
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types

2009-10-23 Thread Yuen Ho Wong

Yuen Ho Wong wyue...@gmail.com added the comment:

Ok I agree it's not important to try to emulate mod_auth_tkt, which I think is 
a totally dead 
project anyway. But I have to take issue with the way unicode decoding is dealt 
with in this 
plugin.

First of all, you are not even suppose to smuggle and unicode in the environ, 
it's just not 
allowed in the WSGI spec. Breaking assumptions is almost always bad since it 
makes 
debugging so much harder without having to read the source code.

I think the larger issue here is how repoze.who and repoze.what deals with 
unicode as a whole.

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue101
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types

2009-10-23 Thread Yuen Ho Wong

Yuen Ho Wong wyue...@gmail.com added the comment:

Yes the charset is irrelevant here. Decoding shouldn't be done here anyway. I 
think I have to 
reiterate the problem as accepting unicode strings because it breaks 
conformance with the 
WSGI spec. There never should have been unicode strings in the environ in the 
first place, so 
never mind transparently handling it without throwing any errors.

As to the problem of using eval(), I can see one can still be paranoid. 
Perhaps, simply 
prepending type , and a whitespace character like \n as a delimiter will take 
care of the 
problem? When you parse on ingress, just split by \n.

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue101
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types

2009-10-23 Thread Yuen Ho Wong

Yuen Ho Wong wyue...@gmail.com added the comment:

P.S. I think this solution solves the uncertainty of possibly clashing with the 
mod_auth_tkt use of 
the userdata field, however small this (non) issue maybe?

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue101
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] [issue100] Configurable character set support for repoze.who and repoze.what

2009-10-10 Thread Yuen Ho Wong

New submission from Yuen Ho Wong wyue...@gmail.com:

It seems that from the design to the implementation of repoze.who and 
repoze.what, including 
its plugins, have completely failed to support holistically any charset other 
then Latin-1. As of 
now, there is not an option global to repoze.who, and subsequently repoze.what, 
to indicate to 
the plugins as to what charset to decode the HTTP params to. This can cause 
serious problems 
for websites written in exotic charsets like Big-5. Even for websites written 
in UTF-8, there is 
still not an option to guarantee the plugins the correct charset codec.

I was looking at the source code for repoze.who, AuthTktCookie, *Forms, and 
repoze.who.plugins.sa, and it seems that putting a `charset` variable into the 
`environ` is the 
best way to go for reasons of global visibility (makes me want to lobby the 
WSGI people to put 
this in WSGI 1.1 and 2). The problem with this is I'm not sure if there's any 
frameworks/servers 
out there that already do this. Another way to do it is to name the variable 
`repoze.who.charset`. 
Yet another was is to have plugins look for a key `charset` from the identity, 
which localize this 
variable's visibility to just repoze.who and repoze.what plugins.

--
assignedto: chrism
messages: 273
nosy: chrism, wyuenho
priority: bug
status: unread
title: Configurable character set support for repoze.who and repoze.what

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue100
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] [issue101] AuthTktCookie should not try to decode userid based on value types

2009-10-10 Thread Yuen Ho Wong

New submission from Yuen Ho Wong wyue...@gmail.com:

This is a minor bug remotely related to issue 100.

According to the WSGI spec, all the strings in the environ should be byte 
strings. However, if 
some IIdentifier plugin returns an unicode login name for identity dict, 
AuthTktCookie will 
automatically encoding the value into UTF-8, and rewrite the `user_data` field 
of the auth_tkt to 
embed the type info.  

This is doubly bad besides deviating from the spec. According to the 
mod_auth_tkt README, 
the user_data field has special meaning to the Apache module and should not be 
rewritten.

IMHO the proper fix is to have the plugin to look for a charset value that's in 
scope when 
something needs to be decoded/encoded.

http://www.python.org/dev/peps/pep-0333/#unicode-issues

--
messages: 274
nosy: wyuenho
priority: bug
status: unread
title: AuthTktCookie should not try to decode userid based on value types
topic: repoze.who

__
Repoze Bugs b...@bugs.repoze.org
http://bugs.repoze.org/issue101
__
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev