Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?
> On 25 Mar 2016, at 15:04, Jason Maddenwrote: > > >> On Mar 25, 2016, at 05:01, Cory Benfield wrote: >> >> Given that gevent is keeping hold of its own reference to the environ, why >> does gevent not simply wrap the environ dict in a class that implements this >> functionality directly? In that manner, gevent can expose its own error >> handling behaviour as desired, and continue to follow PEP-. > > I did consider that, but didn't want to do that unless there were actual > practical problems passing the same object that gevent references. Making a > copy just to pass to the application adds additional time and memory > requirements that are always nice to avoid in a server. For what it’s worth, I’m not advocating a copy. I’m advocating a class like this: class SecureDictWrapper(collections.MutableMapping): def __init__(self, environ): self._environ = environ That class would then implement the MutableMapping API and delegate its calls through to the dictionary itself. There would still only be one dictionary: the only new allocation is for the wrapper class. The overhead is small. =) Cory signature.asc Description: Message signed with OpenPGP using GPGMail ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?
> On Mar 25, 2016, at 05:01, Cory Benfieldwrote: > > Given that gevent is keeping hold of its own reference to the environ, why > does gevent not simply wrap the environ dict in a class that implements this > functionality directly? In that manner, gevent can expose its own error > handling behaviour as desired, and continue to follow PEP-. I did consider that, but didn't want to do that unless there were actual practical problems passing the same object that gevent references. Making a copy just to pass to the application adds additional time and memory requirements that are always nice to avoid in a server. > In fact, I believe this is exactly what PJ was getting at. The ability to > subclass the dictionary (in this case, to subclass it with one that hides > some keys on printing) is only useful to the entity that does the > subclassing, because there is no guarantee that the subclass will not be lost > somewhere else in the WSGI stack. I looked at most of the middleware listed on the WSGI homepage [1], as well as a decent sampling of the packages identified as middleware on PyPI [2]. I didn't find any that passed a new environ on to the next application; they all seem to simply pass on the environ object as given to them. Now that's just a sampling so obviously it doesn't mean that such copying doesn't happen. But doing so eliminates the ability for lower middlewares to communicate with upper middlewares through the environ if they are more than one layer separated---a real-world example is setting `paste.expected_exceptions`---so practically speaking, I imagine it's quite rare. > Because of that, I’m disinclined to want to widen the spec here. PJ’s > original analysis is right: allowing subclasses does not provide more utility > than disallowing them, but it does allow more bugs to creep in due to > inconsistent expectations. Better to have an object with a known set of > behaviours and have applications/servers wrap it in custom function. I'm not sure I agree with that, but I can see the argument. I started out by asking if there were any *practical* reasons not to pass a tiny dict subclass as environ, and when I was surveying existing middleware for this thread, I found a big reason: it turns out that WebOb's Request object *also* verifies that type(environ) is dict [3]. Given the popularity of WebOb and its derivatives like Pyramid, this is not a change gevent can make. We'll take a different approach. Thanks again for all the great insights and discussion! Jason [1] http://wsgi.readthedocs.org/en/latest/libraries.html [2] https://pypi.python.org/pypi?:action=browse=all=319=326=506=509 [3] https://github.com/Pylons/webob/blob/master/webob/request.py#L112 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?
> On 24 Mar 2016, at 16:29, Jason Maddenwrote: > Well, here's a practical use :) And the two points above do not apply to this > practical use, I think. (1) doesn't apply because `__repr__` is not going to > change and isn't fancy. (2) doesn't apply because gevent keeps a reference to > the environ its creates and passes to the app, so if middleware passes a new > dict(environ) on to the app, gevent's own error handling is still secure; > consider passing a SecureEnviron to the app a best-effort at > secure-by-default---if the user configures their application such that this > feature is disabled for part of the stack, that's on the application. No > feature of gevent will break, and it's better than not having the option at > all IMHO. Given that gevent is keeping hold of its own reference to the environ, why does gevent not simply wrap the environ dict in a class that implements this functionality directly? In that manner, gevent can expose its own error handling behaviour as desired, and continue to follow PEP-. In fact, I believe this is exactly what PJ was getting at. The ability to subclass the dictionary (in this case, to subclass it with one that hides some keys on printing) is only useful to the entity that does the subclassing, because there is no guarantee that the subclass will not be lost somewhere else in the WSGI stack. However, if subclassing is only useful to you there is another alternative to the problem, which is to compose the environ dict into an object that applies the custom behaviour. Because of that, I’m disinclined to want to widen the spec here. PJ’s original analysis is right: allowing subclasses does not provide more utility than disallowing them, but it does allow more bugs to creep in due to inconsistent expectations. Better to have an object with a known set of behaviours and have applications/servers wrap it in custom function. Cory signature.asc Description: Message signed with OpenPGP using GPGMail ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?
> On Mar 24, 2016, at 11:09, Alan Kennedywrote: > > I don't see this relevant message in your references. > > https://mail.python.org/pipermail/web-sig/2004-September/000749.html > > Perhaps that, and following messages, might shed more light? Yes, thank you, I did miss that thread. It does help shed some light on the issue. The two main arguments made seem to be that: 1) Creating subclasses of builtin objects is difficult and subject to breakage if you try to get too fancy. That's a fair point, and in the context of when it was written (Python 3.0 was still under discussion) it makes a lot of sense. 2) Middleware or the app can do dict(environ) and lose your subclass. Also true. But I think it's only particularly relevant if the WSGI implementation itself relies on the subclass to provide essential functionality that the PEP specifies (e.g., decoding bytes-to-str on key access). It was also mentioned that practicality beats purity and no practical use for a subclass was known. Well, here's a practical use :) And the two points above do not apply to this practical use, I think. (1) doesn't apply because `__repr__` is not going to change and isn't fancy. (2) doesn't apply because gevent keeps a reference to the environ its creates and passes to the app, so if middleware passes a new dict(environ) on to the app, gevent's own error handling is still secure; consider passing a SecureEnviron to the app a best-effort at secure-by-default---if the user configures their application such that this feature is disabled for part of the stack, that's on the application. No feature of gevent will break, and it's better than not having the option at all IMHO. Jason ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?
I don't see this relevant message in your references. https://mail.python.org/pipermail/web-sig/2004-September/000749.html Perhaps that, and following messages, might shed more light? On Thu, Mar 24, 2016 at 3:18 PM, Jason Maddenwrote: > Hi all, > > > Is there any practical reason that the type of the `environ` object must > be exactly `dict`, as specified in PEP? > > I'm asking because it was recently pointed out that gevent's WSGI server > can sometimes print `environ` (on certain error cases), but that can lead > to sensitive information being kept in the server's logs (e.g., > HTTP_AUTHORIZATION, HTTP_COOKIE, maybe other things). The simplest and most > flexible way to prevent this from happening, not just inadvertently within > gevent itself but also for client applications, I thought, was to have > `environ` be a subclass of `dict` with a customized `__repr__` (much like > WebOb does for MultiDict, and repoze.who does for Identity, both for > similar reasons). > > Unfortunately, when I implemented that in [0], I discovered that > `wsgiref.validator` asserts that type(environ) is dict. I looked up the > PEP, and sure enough, PEP states that environ "must be a builtin > Python dictionary (not a subclass, UserDict or other dictionary > emulation)." [1] > > Background/History > == > > That seemed overly restrictive to me, so I tried to backtrack the history > of that language in hopes of discovering the rationale. > > - It was present in the predecessor of PEP , PEP 0333, in the first > version committed to the repository in August 2004. [2] > - Prior to that, it was in both drafts of what would become PEP 0333 > posted to this mailing list, again from August 2004: [3], [4]. > - The ancestor of those drafts, the "Python Web Container Interface v1.0" > was posted in December of 2003 with somewhat less restrictive language: > "the environ object *must* be a Python dictionaryThe rationale for > requiring a dictionary is to maximize portability > between containers" [5]. > > Now, the discussion on that earliest draft in [5] specifically brought up > using other types that implement all the methods of a dictionary, like > UserDict.DictMixin [6]. The last post on the subject in that thread seemed > to be leaning towards accepting non-dict objects, at least if they were > good enough [7]. > > By the time the draft became recognizable as the precursor to PEP 0333 in > [3], the very strict language we have now was in place. That draft, > however, specifically stated that it was intended to be compatible with > Python 1.5.2. In Python 1.5.2, it wasn't possible to subclass the builtin > dict, so imitations, like UserDict.DictMixin, were necessarily imprecise. > This was later changed to the much-maligned Python 2.2.2 release [8]; > Python 2.2 added the ability to subclass dict, but the language wasn't > changed. > > Today > = > > Given that today, we can subclass dict with full fidelity, is there still > any practical reason not to be able to do so? I'm probably OK with gevent > violating the letter of the spec in this regard, so long as there are no > practical consequences. I was able to think of two possible objections, but > both can be solved: > > - Pickling the custom `environ` type and then loading it in another > process might not work if the class is not available. I can imagine this > coming up with Celery, for example. This is easily fixed by adding an > appropriate `__reduce_ex__` implementation. > > - Code somewhere relies on `if type(some_object) is dict:` (where > `environ` became `some_object`, presumably through several levels of > calls), instead of `isinstance(some_object, dict)` or > `isinstance(some_object, collections.MutableMapping)`. The solution here is > simply to not do that :) Pylint, among other linters, produces warnings if > you do. > > Can anyone think of any other practical reasons I've overlooked? Is this > just a horrible idea for other reasons? > > I appreciate any discussion! > > Thanks, > Jason > > [0] https://github.com/gevent/gevent/compare/secure-environ > [1] https://www.python.org/dev/peps/pep-/#specification-details > [2] > https://github.com/python/peps/commit/d5864f018f58a35fa787492e6763e382f98b923c#diff-ff370d50af3db062b015d1ef85935779 > [3] https://mail.python.org/pipermail/web-sig/2004-August/000518.html > [4] https://mail.python.org/pipermail/web-sig/2004-August/000562.html > [5] https://mail.python.org/pipermail/web-sig/2003-December/000394.html > [7] https://mail.python.org/pipermail/web-sig/2003-December/000401.html > [8] https://mail.python.org/pipermail/web-sig/2004-August/000565.html > > ___ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com > ___ Web-SIG mailing list