Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?

2016-03-25 Thread Cory Benfield

> On 25 Mar 2016, at 15:04, Jason Madden  wrote:
> 
> 
>> On Mar 25, 2016, at 05:01, Cory Benfield  wrote:
>> 
>> Given that gevent is keeping hold of its own reference to the environ, why 
>> does gevent not simply wrap the environ dict in a class that implements this 
>> functionality directly? In that manner, gevent can expose its own error 
>> handling behaviour as desired, and continue to follow PEP-.
> 
> I did consider that, but didn't want to do that unless there were actual 
> practical problems passing the same object that gevent references. Making a 
> copy just to pass to the application adds additional time and memory 
> requirements that are always nice to avoid in a server.

For what it’s worth, I’m not advocating a copy. I’m advocating a class like 
this:


class SecureDictWrapper(collections.MutableMapping):
def __init__(self, environ):
self._environ = environ

That class would then implement the MutableMapping API and delegate its calls 
through to the dictionary itself. There would still only be one dictionary: the 
only new allocation is for the wrapper class. The overhead is small. =)

Cory



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?

2016-03-25 Thread Jason Madden

> On Mar 25, 2016, at 05:01, Cory Benfield  wrote:
> 
> Given that gevent is keeping hold of its own reference to the environ, why 
> does gevent not simply wrap the environ dict in a class that implements this 
> functionality directly? In that manner, gevent can expose its own error 
> handling behaviour as desired, and continue to follow PEP-.

I did consider that, but didn't want to do that unless there were actual 
practical problems passing the same object that gevent references. Making a 
copy just to pass to the application adds additional time and memory 
requirements that are always nice to avoid in a server. 

> In fact, I believe this is exactly what PJ was getting at. The ability to 
> subclass the dictionary (in this case, to subclass it with one that hides 
> some keys on printing) is only useful to the entity that does the 
> subclassing, because there is no guarantee that the subclass will not be lost 
> somewhere else in the WSGI stack.

I looked at most of the middleware listed on the WSGI homepage [1], as well as 
a decent sampling of the packages identified as middleware on PyPI [2]. I 
didn't find any that passed a new environ on to the next application; they all 
seem to simply pass on the environ object as given to them. Now that's just a 
sampling so obviously it doesn't mean that such copying doesn't happen. But 
doing so eliminates the ability for lower middlewares to communicate with upper 
middlewares through the environ if they are more than one layer separated---a 
real-world example is setting `paste.expected_exceptions`---so practically 
speaking, I imagine it's quite rare.

> Because of that, I’m disinclined to want to widen the spec here. PJ’s 
> original analysis is right: allowing subclasses does not provide more utility 
> than disallowing them, but it does allow more bugs to creep in due to 
> inconsistent expectations. Better to have an object with a known set of 
> behaviours and have applications/servers wrap it in custom function.

I'm not sure I agree with that, but I can see the argument.

I started out by asking if there were any *practical* reasons not to pass a 
tiny dict subclass as environ, and when I was surveying existing middleware for 
this thread, I found a big reason: it turns out that WebOb's Request object 
*also* verifies that type(environ) is dict [3]. Given the popularity of WebOb 
and its derivatives like Pyramid, this is not a change gevent can make. We'll 
take a different approach.

Thanks again for all the great insights and discussion!

Jason


[1] http://wsgi.readthedocs.org/en/latest/libraries.html
[2] https://pypi.python.org/pypi?:action=browse=all=319=326=506=509
[3] https://github.com/Pylons/webob/blob/master/webob/request.py#L112
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?

2016-03-25 Thread Cory Benfield

> On 24 Mar 2016, at 16:29, Jason Madden  wrote:
> Well, here's a practical use :) And the two points above do not apply to this 
> practical use, I think. (1) doesn't apply because `__repr__` is not going to 
> change and isn't fancy. (2) doesn't apply because gevent keeps a reference to 
> the environ its creates and passes to the app, so if middleware passes a new 
> dict(environ) on to the app, gevent's own error handling is still secure; 
> consider passing a SecureEnviron to the app a best-effort at 
> secure-by-default---if the user configures their application such that this 
> feature is disabled for part of the stack, that's on the application. No 
> feature of gevent will break, and it's better than not having the option at 
> all IMHO.

Given that gevent is keeping hold of its own reference to the environ, why does 
gevent not simply wrap the environ dict in a class that implements this 
functionality directly? In that manner, gevent can expose its own error 
handling behaviour as desired, and continue to follow PEP-.

In fact, I believe this is exactly what PJ was getting at. The ability to 
subclass the dictionary (in this case, to subclass it with one that hides some 
keys on printing) is only useful to the entity that does the subclassing, 
because there is no guarantee that the subclass will not be lost somewhere else 
in the WSGI stack. However, if subclassing is only useful to you there is 
another alternative to the problem, which is to compose the environ dict into 
an object that applies the custom behaviour.

Because of that, I’m disinclined to want to widen the spec here. PJ’s original 
analysis is right: allowing subclasses does not provide more utility than 
disallowing them, but it does allow more bugs to creep in due to inconsistent 
expectations. Better to have an object with a known set of behaviours and have 
applications/servers wrap it in custom function.

Cory



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?

2016-03-24 Thread Jason Madden

> On Mar 24, 2016, at 11:09, Alan Kennedy  wrote:
> 
> I don't see this relevant message in your references.
> 
> https://mail.python.org/pipermail/web-sig/2004-September/000749.html
> 
> Perhaps that, and following messages, might shed more light?

Yes, thank you, I did miss that thread. It does help shed some light on the 
issue.

The two main arguments made seem to be that:

1) Creating subclasses of builtin objects is difficult and subject to breakage 
if you try to get too fancy.

That's a fair point, and in the context of when it was written (Python 3.0 was 
still under discussion) it makes a lot of sense.

2) Middleware or the app can do dict(environ) and lose your subclass.

Also true. But I think it's only particularly relevant if the WSGI 
implementation itself relies on the subclass to provide essential functionality 
that the PEP specifies (e.g., decoding bytes-to-str on key access).

It was also mentioned that practicality beats purity and no practical use for a 
subclass was known.

Well, here's a practical use :) And the two points above do not apply to this 
practical use, I think. (1) doesn't apply because `__repr__` is not going to 
change and isn't fancy. (2) doesn't apply because gevent keeps a reference to 
the environ its creates and passes to the app, so if middleware passes a new 
dict(environ) on to the app, gevent's own error handling is still secure; 
consider passing a SecureEnviron to the app a best-effort at 
secure-by-default---if the user configures their application such that this 
feature is disabled for part of the stack, that's on the application. No 
feature of gevent will break, and it's better than not having the option at all 
IMHO.

Jason
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?

2016-03-24 Thread Alan Kennedy
I don't see this relevant message in your references.

https://mail.python.org/pipermail/web-sig/2004-September/000749.html

Perhaps that, and following messages, might shed more light?

On Thu, Mar 24, 2016 at 3:18 PM, Jason Madden 
wrote:

> Hi all,
>
>
> Is there any practical reason that the type of the `environ` object must
> be exactly `dict`, as specified in PEP?
>
> I'm asking because it was recently pointed out that gevent's WSGI server
> can sometimes print `environ` (on certain error cases), but that can lead
> to sensitive information being kept in the server's logs (e.g.,
> HTTP_AUTHORIZATION, HTTP_COOKIE, maybe other things). The simplest and most
> flexible way to prevent this from happening, not just inadvertently within
> gevent itself but also for client applications, I thought, was to have
> `environ` be a subclass of `dict` with a customized `__repr__` (much like
> WebOb does for MultiDict, and repoze.who does for Identity, both for
> similar reasons).
>
> Unfortunately, when I implemented that in [0], I discovered that
> `wsgiref.validator` asserts that type(environ) is dict. I looked up the
> PEP, and sure enough, PEP  states that environ "must be a builtin
> Python dictionary (not a subclass, UserDict or other dictionary
> emulation)." [1]
>
> Background/History
> ==
>
> That seemed overly restrictive to me, so I tried to backtrack the history
> of that language in hopes of discovering the rationale.
>
> - It was present in the predecessor of PEP , PEP 0333, in the first
> version committed to the repository in August 2004. [2]
> - Prior to that, it was in both drafts of what would become PEP 0333
> posted to this mailing list, again from August 2004: [3], [4].
> - The ancestor of those drafts, the "Python Web Container Interface v1.0"
> was posted in December of 2003 with somewhat less restrictive language:
> "the environ object *must* be a Python dictionaryThe rationale for
> requiring a dictionary is to maximize portability
> between containers" [5].
>
> Now, the discussion on that earliest draft in [5] specifically brought up
> using other types that implement all the methods of a dictionary, like
> UserDict.DictMixin [6]. The last post on the subject in that thread seemed
> to be leaning towards accepting non-dict objects, at least if they were
> good enough [7].
>
> By the time the draft became recognizable as the precursor to PEP 0333 in
> [3], the very strict language we have now was in place. That draft,
> however, specifically stated that it was intended to be compatible with
> Python 1.5.2. In Python 1.5.2, it wasn't possible to subclass the builtin
> dict, so imitations, like UserDict.DictMixin, were necessarily imprecise.
> This was later changed to the much-maligned Python 2.2.2 release [8];
> Python 2.2 added the ability to subclass dict, but the language wasn't
> changed.
>
> Today
> =
>
> Given that today, we can subclass dict with full fidelity, is there still
> any practical reason not to be able to do so? I'm probably OK with gevent
> violating the letter of the spec in this regard, so long as there are no
> practical consequences. I was able to think of two possible objections, but
> both can be solved:
>
> - Pickling the custom `environ` type and then loading it in another
> process might not work if the class is not available. I can imagine this
> coming up with Celery, for example. This is easily fixed by adding an
> appropriate `__reduce_ex__` implementation.
>
> - Code somewhere relies on `if type(some_object) is dict:` (where
> `environ` became `some_object`, presumably through several levels of
> calls), instead of `isinstance(some_object, dict)` or
> `isinstance(some_object, collections.MutableMapping)`. The solution here is
> simply to not do that :) Pylint, among other linters, produces warnings if
> you do.
>
> Can anyone think of any other practical reasons I've overlooked? Is this
> just a horrible idea for other reasons?
>
> I appreciate any discussion!
>
> Thanks,
> Jason
>
> [0] https://github.com/gevent/gevent/compare/secure-environ
> [1] https://www.python.org/dev/peps/pep-/#specification-details
> [2]
> https://github.com/python/peps/commit/d5864f018f58a35fa787492e6763e382f98b923c#diff-ff370d50af3db062b015d1ef85935779
> [3] https://mail.python.org/pipermail/web-sig/2004-August/000518.html
> [4] https://mail.python.org/pipermail/web-sig/2004-August/000562.html
> [5] https://mail.python.org/pipermail/web-sig/2003-December/000394.html
> [7] https://mail.python.org/pipermail/web-sig/2003-December/000401.html
> [8] https://mail.python.org/pipermail/web-sig/2004-August/000565.html
>
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com
>
___
Web-SIG mailing list