Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?
I don't see this relevant message in your references. https://mail.python.org/pipermail/web-sig/2004-September/000749.html Perhaps that, and following messages, might shed more light? On Thu, Mar 24, 2016 at 3:18 PM, Jason Madden wrote: > Hi all, > > > Is there any practical reason that the type of the `environ` object must > be exactly `dict`, as specified in PEP? > > I'm asking because it was recently pointed out that gevent's WSGI server > can sometimes print `environ` (on certain error cases), but that can lead > to sensitive information being kept in the server's logs (e.g., > HTTP_AUTHORIZATION, HTTP_COOKIE, maybe other things). The simplest and most > flexible way to prevent this from happening, not just inadvertently within > gevent itself but also for client applications, I thought, was to have > `environ` be a subclass of `dict` with a customized `__repr__` (much like > WebOb does for MultiDict, and repoze.who does for Identity, both for > similar reasons). > > Unfortunately, when I implemented that in [0], I discovered that > `wsgiref.validator` asserts that type(environ) is dict. I looked up the > PEP, and sure enough, PEP states that environ "must be a builtin > Python dictionary (not a subclass, UserDict or other dictionary > emulation)." [1] > > Background/History > == > > That seemed overly restrictive to me, so I tried to backtrack the history > of that language in hopes of discovering the rationale. > > - It was present in the predecessor of PEP , PEP 0333, in the first > version committed to the repository in August 2004. [2] > - Prior to that, it was in both drafts of what would become PEP 0333 > posted to this mailing list, again from August 2004: [3], [4]. > - The ancestor of those drafts, the "Python Web Container Interface v1.0" > was posted in December of 2003 with somewhat less restrictive language: > "the environ object *must* be a Python dictionaryThe rationale for > requiring a dictionary is to maximize portability > between containers" [5]. > > Now, the discussion on that earliest draft in [5] specifically brought up > using other types that implement all the methods of a dictionary, like > UserDict.DictMixin [6]. The last post on the subject in that thread seemed > to be leaning towards accepting non-dict objects, at least if they were > good enough [7]. > > By the time the draft became recognizable as the precursor to PEP 0333 in > [3], the very strict language we have now was in place. That draft, > however, specifically stated that it was intended to be compatible with > Python 1.5.2. In Python 1.5.2, it wasn't possible to subclass the builtin > dict, so imitations, like UserDict.DictMixin, were necessarily imprecise. > This was later changed to the much-maligned Python 2.2.2 release [8]; > Python 2.2 added the ability to subclass dict, but the language wasn't > changed. > > Today > = > > Given that today, we can subclass dict with full fidelity, is there still > any practical reason not to be able to do so? I'm probably OK with gevent > violating the letter of the spec in this regard, so long as there are no > practical consequences. I was able to think of two possible objections, but > both can be solved: > > - Pickling the custom `environ` type and then loading it in another > process might not work if the class is not available. I can imagine this > coming up with Celery, for example. This is easily fixed by adding an > appropriate `__reduce_ex__` implementation. > > - Code somewhere relies on `if type(some_object) is dict:` (where > `environ` became `some_object`, presumably through several levels of > calls), instead of `isinstance(some_object, dict)` or > `isinstance(some_object, collections.MutableMapping)`. The solution here is > simply to not do that :) Pylint, among other linters, produces warnings if > you do. > > Can anyone think of any other practical reasons I've overlooked? Is this > just a horrible idea for other reasons? > > I appreciate any discussion! > > Thanks, > Jason > > [0] https://github.com/gevent/gevent/compare/secure-environ > [1] https://www.python.org/dev/peps/pep-/#specification-details > [2] > https://github.com/python/peps/commit/d5864f018f58a35fa787492e6763e382f98b923c#diff-ff370d50af3db062b015d1ef85935779 > [3] https://mail.python.org/pipermail/web-sig/2004-August/000518.html > [4] https://mail.python.org/pipermail/web-sig/2004-August/000562.html > [5] https://mail.python.org/pipermail/web-sig/2003-December/000394.html > [7] https://mail.python.org/pipermail/web-sig/2003-December/000401.html > [8] https://mail.python.org/pipermail/web-sig/2004-August/000565.html > > ___ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com > ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.py
Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest
[Cory Benfield] > Folks, just a reminder: RFC 2616 is dead. RFC 7230 says that *newly defined* header > fields should limit their field values to US-ASCII, but older header fields are a > crapshoot (though it notes that “in practice, most” header field values use US-ASCII). > > Regardless, it seems to me that the correct method of communicating field values would have been byte strings. I think it's worth pointing out that the original intention of specifying iso-8859-1 encoding was the request components *would* be presented to the application as bytes. WSGI was designed to work on python 2, where bytes and strings were stored in the same datatype. In cpythons UCS-2 encoding, where every character takes two bytes, only the lower byte would contain a value if the character was from the iso-8859-1 character set. Moreover, encoding and decoding such "byte strings" from iso-8859-1 would not change any values, i.e. iso-8859-1 was chosen because encoding and decoding from it was an identity transform. The same considerations applied to Jython 2.x (which uses UTF-16) and Ironpython 2.x (also UTF-16 I think), but which both had to the same bytes/strings duality problem. If python 2.x had had a bytes type, then that's what would have been used. This would also have made more explicit that it is the applications job to decode the bytes into whatever encoding it thinks is appropriate (i.e. essentially what it has guessed, in the real world). The WSGI servers job is to give the original bytes from the request to the WSGI application *unchanged*. The concluding message in the original discussion of encodings is here, if anyone is interested. https://mail.python.org/pipermail/web-sig/2004-September/000860.html Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] REMOTE_ADDR and proxys
[Alan] >> I disagreee. I think it is the role of the server/gateway to represent the >> actual incoming HTTP request as accurately as possible. [Robert] > So I agree with you OK, so we agree :-) [Robert] > but in a multi-tier deployment architecture: Then why disagree? ;-) [Robert] > Client -> LB -> Front-end-cache -> HTTPd ->WSGI -> application, which > 'request' do app developers need represented? They want the client > request, which is 3 network hops away: its entirely reasonable (and > supported by RFC2616 and RFC7230 etc) for the internal structure of > such a deployment to extend things in such a way that normal > guarantees are suspended (e.g. caching, source addresses etc). So what do you include and what do you exclude? 1. It's quite possible that the client is behind som kind of egress proxy or firewall, which may or may not add a X-Forwarded-For header. Should this be included? 2. What if your frontend LB is not configured to set an X-Forwarded-For header? What if it is? What if there is differing configuration across multiple LBs that are in your ingress path, and you get conflicting results depending on what path the request came in? 3. What if there is a cache miss on your frontend cache? Will the caching proxy add a header? 4. What if the proxy added a non-standard X-Forwarded-Ip header? - If it does, can you do reverse DNS lookup to find the host that it reverses to? - If yes, in what DNS authority? 5. Is the order in which X-Forwarded-For headers guaranteed? Is it trustworthy? Will every proxy in the chain declare itself? - Answers: no, no, and no. Each of the above questions has multiple answers, each of which is arguably valid, depending on your point of view. The problem is that HTTP proxies are just too easy to write, and every author of a proxy will make slightly different decisions on what should be forwarded and what should not. Every configurable proxy can and will be configured differently, according to the requirements of the folks operating it. http://proxies.xhaus.com [Robert] > which 'request' do app developers need represented? The request that arrives into the origin server, exactly as it arrived, unmodified. That way they can apply their own heuristics to processing the request, knowing that it has not been interfered with. > They want the client request, which is 3 network hops away In your example, it's 3 hops away. I can easily paint you a thousand different scenarios, each of which is a different number of hops away. [Robert] > So it sounds like it should be the responsibility of a middleware to renormalize the environment? In order for that to be the case, you have strictly define what "normalization" means. I believe that it is not possible to fully specify "normalization", and that any attempt to do so is futile. If you want to attempt it for the specific scenarios that your particular application has to deal with, then by all means code your version of "normalization" into your application. Or write some middleware to do it. But trying to make "normalization" a part of a WSGI-style specification is impossible. Alan. On Mon, Sep 29, 2014 at 10:14 PM, Collin Anderson wrote: > Thanks guys. So it sounds like it should be the responsibility of a > middleware to re normalize the environment? > > On Wed, Sep 24, 2014 at 4:51 PM, Robert Collins > wrote: > >> On 25 September 2014 07:16, Alan Kennedy wrote: >> > [Collin] >> >> It seems to me, it is the role of the server/gateway, not the >> >> application/framework to determine the "correct" client ip address and >> >> correctly account for the situation of being behind a known proxy. >> > >> > I disagreee. I think it is the role of the server/gateway to represent >> the >> > actual incoming HTTP request as accurately as possible. >> >> So I agree with you, but in a multi-tier deployment architecture: >> >> Client -> LB -> Front-end-cache -> HTTPd ->WSGI -> application, which >> 'request' do app developers need represented? They want the client >> request, which is 3 network hops away: its entirely reasonable (and >> supported by RFC2616 and RFC7230 etc) for the internal structure of >> such a deployment to extend things in such a way that normal >> guarantees are suspended (e.g. caching, source addresses etc). >> >> > If the application knows about remote proxies and local reverse proxies, >> > then it can take action accordingly. >> > >> > But the server should not attempt any magic: it is up to the >> application to >> > interpret the request in whatever way it sees fit. >> ... >> > If want to
Re: [Web-SIG] REMOTE_ADDR and proxys
[Collin] > It seems to me, it is the role of the server/gateway, not the > application/framework to determine the "correct" client ip address and > correctly account for the situation of being behind a known proxy. I disagreee. I think it is the role of the server/gateway to represent the actual incoming HTTP request as accurately as possible. If the application knows about remote proxies and local reverse proxies, then it can take action accordingly. But the server should not attempt any magic: it is up to the application to interpret the request in whatever way it sees fit. [Collin] > Also, I am aware of the security issues of improperly handling > X-Forwarded-For, but that's an issue no matter where it's being > handled. This is exactly why the server/gateway should refuse the temptation to guess. It should leave it to the application to be smart enough to handle all scenarios appropriately, knowing that it has access to the original unmodified request. If want to the magic rewriting functionality to be isolated from the application, then it could easily be implemented as middleware. Alan. On Wed, Sep 10, 2014 at 7:41 PM, Collin Anderson wrote: > Hi All, > > The CGI spec says: > > Script authors should be aware that the REMOTE_ADDR and REMOTE_HOST > meta-variables (see sections 4.1.8 and 4.1.9) may not identify the > ultimate source of the request. They identify the client for the > immediate request to the server; that client may be a proxy, gateway, > or other intermediary acting on behalf of the actual source client. > > However, if the there is a revere proxy on the server side (such as > nginx), it seems to me, the ip address of the "immediate request to > the server" will be "127.0.0.1" and the actual address will be in an > "X-Forwarded-For" header. > > It seems to me, it is the role of the server/gateway, not the > application/framework to determine the "correct" client ip address and > correctly account for the situation of being behind a known proxy. > > Also, I am aware of the security issues of improperly handling > X-Forwarded-For, but that's an issue no matter where it's being > handled. > > So, in the case of a reverse proxy, is it ok if the WSGI server sends > back a REMOTE_ADDR that isn't 127.0.0.1, even if it's the immediate > connection to the WSGI server is local? > > Basically can we interpret the "server" above to be the machine rather > than the program? > > Thanks, > Collin > ___ > Web-SIG mailing list > Web-SIG@python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com > ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Fwd: Can writing to stderr be evil for web apps?
[anatoly] > Martin expressed concerns that using logging module with stderr output > can break web applications, such as PyPI. Please can you specify exactly what you mean by "using logging module with stderr output"? Dealing with stderr is a webserver specific concern. Consider the case where you're the author of a webserver that deals with CGI scripts. When you get a request for the CGI script, you start a subprocess to run the script. You must decide what to do with the stdin, stdout and stderr of the process. - CGI mandates that any content that came with the request (e.g. a POST body) should be fed into stdin(if no other mechanism is in place[0]) - CGI mandates that the stdout of the process is sent back to the client (if no other mechanism is in place[1]). - CGI makes no mention of stderr. Various webservers permit configurable handling of stderr. For example, Tomcat has a setting called "swallowOutput" which redirects both stdout and stderr to a log file. (Obviously, Tomcat's treatment of stdout is different for CGI) http://tomcat.apache.org/tomcat-6.0-doc/config/context.html WSGI has a specific mechanism for diagnostic output, wsgi.errors. """ wsgi.errors An output stream (file-like object) to which error output can be written, for the purpose of recording program or other errors in a standardized and possibly centralized location. This should be a "text mode" stream; i.e., applications should use "\n" as a line ending, and assume that it will be converted to the correct line ending by the server/gateway. ... For many servers, wsgi.errors will be the server's main error log. Alternatively, this may be sys.stderr, or a log file of some sort. The server's documentation should include an explanation of how to configure this or where to find the recorded output. A server or gateway may supply different error streams to different applications, if this is desired. """ Lastly, note that WSGI supplies an example CGI gateway, about which it has this to say about error handling """ Note that this simple example has limited error handling, because by default an uncaught exception will be dumped to sys.stderr and logged by the web server. """ http://www.python.org/dev/peps/pep-/#the-server-gateway-side So I would say that 1. If you are writing a web application, and want it run under any WSGI container, and for the user to be able to control that output in a way with which they are familiar (i.e. which is documented and may have specific configuration options), send the output to wsgi.errors. 2. If you are writing a web server, you should either capture or ignore stderr. If it is captured, then it is reasonable to, e.g., write it to a file so that the user can find it. It should never be mixed with stdout if stdout is the mechanism by which the application communicates with the webserver, as with CGI. Alan. [0] http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt """ Section 6.2 Request Message-Bodies As there may be a data entity attached to the request, there MUST be a system defined method for the script to read these data. Unless defined otherwise, this will be via the 'standard input' file descriptor. """ [1] http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt """ Section 7. Data Output from the CGI Script There MUST be a system defined method for the script to send data back to the server or client; a script MUST always return some data. Unless defined otherwise, this will be via the 'standard output' file descriptor """ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI for Python 3
[PJ Eby] > IOW, the bytes/string discussion on Python-dev has kind of led me to realize > that we might just as well make the *entire* stack bytes (incoming and > outgoing headers *and* streams), and rewrite that bit in PEP 333 about using > str on "Python 3000" to say we go with bytes on Python 3+ for everything > that's a str in today's WSGI. > > Or, to put it another way, if I knew then what I know *now*, I think I'd > have written the PEP the other way around, such that the use of 'str' in > WSGI would be a substitute for the future 'bytes' type, rather than viewing > some byte strings as a forward-compatible substitute for Py3K unicode > strings. > > Of course, this would be a WSGI 2 change, but IMO we're better off making a > clean break with backward compatibility here anyway, rather than having > conditionals. Also, going with bytes everywhere means we don't have to > rename SCRIPT_NAME and PATH_INFO, which in turn avoids deeper rewrites being > required in today's apps. +1 > (Hm. Although actually, I suppose we *could* just borrow the time machine > and pretend that WSGI called for "byte-strings everywhere" all along...) +1/0 Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[Armin] > Of course a server configuration variable would be a solution for many > of these problems, but I don't like the idea of changing application > behavior based on server configuration. So you don't like the way that Django, Werkzeug, WebOb, etc, do it now, even though they appear to be mostly successful, and you're happy to cite them as such? >From the applications point of view, a framework-level configuration variable is the same as a server-level configuration variable. > At that point we will finally > have successfully killed the idea of nested WSGI applications, because > those could depend on different charsets. Wouldn't well-written applications depend on unicode? The server configured charset is simply an explicit statement of the character set from which incoming requests are to be decoded, into unicode, and no other character set. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[Armin] > No, they know the character sets. Hmmm, define "know" ;-) [Armin] > You tell them what character set you > want to use. For example you can specify "utf-8", and they will > decode/encode from/to utf-8. But there is no way for the application to > send information to the server before they are invoked to tell the > server what encoding they want to use. I see this as being the same as Graham's suggested approach of a per-server configurable charset, which is then stored in the WSGI dictionary, so that applications that have problems, i.e. that detect mojibake in the unicode SCRIPT_NAME or PATH_INFO, can attempt to undo the faulty decoding by the server. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[Armin] > Because that problem was solved a long ago in applications themselves. > Webob, Werkzeug, Paste, Pylons, Django, you name it, all are operating > on unicode. And the way they do that is straightforward. So what are we all discussing? Those frameworks obviously have solved all of the problems of decoding incoming request components, e.g. 1. SCRIPT_NAME 2. PATH_INFO 3. QUERY_STRING 4. Etc from miscellaneous unknown character sets into unicode, with out any mistakes, under all possible WSGI environments, e.g. 1. Mod_wsgi 2. Modjy (java servlets) 3. IIS 4. CGI 5. FCGI 6. Etc So why not just adopt one of those mechanisms, e.g. Django, and make it the de-facto standard? Since they all deliver unicode, python 3 is no longer a problem, since it permits only unicode strings. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[Alan] >> Is there a real need out there? [Armin] > In python 3, yes. Because the stdlib no longer works with bytes and the > bytes object has few string semantics left. Why can't we just do the same as the java servlet spec? I.E. 1. Ignore the encoding issues being discussed 2. Give the programmer (possibly mojibake) unicode strings in the WSGI environ anyway 3. And let them solve their problems themselves, using server configuration or bespoke middleware [Alan] >> Java programmers just tolerate this, although they may curse the >> developers of the servlet spec for not having solved their specific >> problem for them. [Armin] > Many Java apps are also still using latin1 only or have all kinds of > problems with charsets. My point exactly. Many web developers simply never have to deal with these issues, perhaps a majority. The ones that do have to sort it out for themselves. To do so, the publishers of the various containers give them (non-standard) options to control the decoding of the incoming request and all of its component parts: you cited the Tomcat approach above. Other containers do it differently. Which means that i18n knowledge is not portable between containers. It would be nice if we could avoid such a situation with i18n and WSGI. But I suppose I'm a little dubious that this group can out-do the enormous java community, and the enormous financial resources that Sun, IBM, Oracle, etc, etc, plough into it. And still failed to solve this complex problem satisfactorily. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[P.J. Eby] >> Actually, latin-1 bytes encoding is the *simplest* thing that could >> possibly work, since it works already in e.g. Jython, and is actually >> in the spec already... and any framework that wants unicode URIs >> already has to decode them, so the code is already written. [Armin] > Except that nobody implements that So, if nobody implements that, then why are we trying to standardise it? Is there a real need out there? Or are all these discussions solely driven by the need/desire to have only unicode strings in the WSGI dictionary under python 3? Which is a worthy goal, IMHO. Java has been there since the very start, since java strings have always been unicode. Take a look at the java docs for HttpServlet: no methods return bytes/bytearrays. http://java.sun.com/products/servlet/2.5/docs/servlet-2_5-mr2/javax/servlet/http/HttpServletRequest.html But the java servlet spec still ignores *all* of the encoding concerns being discussed here. Which means that mistakes/mojibake must happen all the time. And it's up to the author of the individual java web application to solve those problems, using a mechanism appropriate for their needs and local environment. Java programmers just tolerate this, although they may curse the developers of the servlet spec for not having solved their specific problem for them. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[Ian] > When things get messed up I recommend people use a middleware > (paste.deploy.config.PrefixMiddleware, though I don't really care what they > use) to fix up the request to be correct. Pulling it from REQUEST_URI would > be fine. That would be unworkable under java servlet containers, since they each take a different approach to addressing encoding issues, or fail to deal with them entirely. So there would probably have to be a special case for every single one of these http://en.wikipedia.org/wiki/List_of_Servlet_containers Each of which has a number of different ways of being configured in relation to these issues. I don't know if it would even be possible to write such a middleware. And retain all of one's hair. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Request for Comments on upcoming WSGI Changes
[Ian] >> OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO >> introduce two equivalent variables that hold the NOT url-decoded values. [Graham] > That may be fine for pure Python web servers where you control the > split of REQUEST_URI into SCRIPT_NAME and PATH_INFO in the first place > but don't have that luxury in Apache or via FASTCGI/SCGI/CGI etc as > that is done by the web server. Also, as pointed out in my blog, > because of rewrites in web server, it may be difficult to try and map > SCRIPT_NAME and PATH_INFO back into REQUEST_URI provided to try and > reclaim original characters. There is also the problem that often > FASTCGI totally stuffs up SCRIPT_NAME/PATH_INFO split anyway and > manual overrides needed to tweak them. This applies doubly under Java servlets, where different containers take different approaches to solve these rather hard problems. It is worth noting that they have to do so because the java servlet spec, even under the most recent 2.5, punts on *all* of the issues being discussed here. See here for how Tomcat does it. Or half does it, messily. http://wiki.apache.org/tomcat/FAQ/CharacterEncoding I know this is not helpful ;-) Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI 1 Changes [ianb's and my changes]
[Rene] >> I think you mean pre-2.2 support, not python 2.2? iterators came >> about in python 2.2. [Armin] > That might be. That was before my time. I'm pretty sure the first > Python version I used was 2.3, but don't quote me on that. As WSGI was being developed, cpython was at version 2.3. The only reason that support for "older versions" was in the spec was because jython was at version 2.1 at the time. The WSGI spec was made much simpler by the use of the iterator protocol (PEP 234), which was in introduced into the language in 2.2. So where the spec says "Supporting Older (<2.2) Versions of Python" It should probably have read "Supporting Older (pre-pep-234-iterator-protocol) Versions of Python" I don't know of any modern python implementation that doesn't support the iterator protocol. It's probably time to drop that section from the PEP. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Announcing bobo
[Etienne] > If you want to start a thread for Bobo, please switch mailing-list or > create a new thread, as all I wanted was to tell Jim my disappointement > regarding Bobo, and I still think its not very revolutionary. I completely disagree; this is definitely the appopriate list for discussing web frameworks and new approaches. There is no perfect framework in python, or any other language. It is only with the introduction, discussion, acceptance and assimilation of new ideas that we all move forward together. Jim has the longest history of all in Python web frameworks; he created the very concept. He founded and built the entire Zope community; I will always listen to what he has to say. I wish you the best of luck with your own web framework, notmm http://gthc.org/projects/notmm/0.2.12/ Which seems to have some potential, but currently lacks community support. http://gthc.org/community/ I'm looking forward to Europython, where I know I'll be meeting some great python folks, and hopefully some of us will get to continue our WSGI revision discussions. All the best, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] RESTful Python email list?
[Pete] > Any interest in a dedicated email list for REST + python, a la the > restful-json group [0]? The group would discuss strategies for REST > architecture built with and within Python. WSGI 1.0 vs. 2.0 vs. 2e6 is out > of scope. ;-) Just a thought: is there any reason why RESTful python discussions cannot take place on the restful-json group referred to? Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words
[Brian] > Here is the change that removes the use of RFC 2047 from HTTP in HTTPbis. Grand so; all we need to do is to wait for everyone to stop using HTTP/1.1, start using HTTP/bis, and our problems are at an end! ;-) Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words
[James] > If you want to start a discussion about having a standard parsed-header > object in WSGI, that's another thing, but saying that WSGI servers should > *partially* decode the headers seems rather silly to me. Hi James, It's a shame that your proposal to add the twisted header parsing library to the standard library didn't catch on years ago. http://mail.python.org/pipermail/web-sig/2006-February/002119.html Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Python 3.0 and WSGI 1.0.
[Sylvain] > Would there be any interest in asking the HTTP-BIS working group [1] what > they think about it? > > Currently I couldn't find anything in their drafts suggesting they had > decided to clarify this issue from a protocol's perspective but they might > consider it to be relevant to their goals. > > - Sylvain > > [1] http://www.ietf.org/html.charters/httpbis-charter.html As mentioned in an earlier post, I think their current spec avoids the issue, by still relying on "octet-by-octet" comparison. But I did come across this discussion on their list, which goes into all of the issues in fine detail. http://www.nabble.com/PROPOSAL%3A-i74%3A-Encoding-for-non-ASCII-headers-tt16274487.html#a16291951 Quote of the thread [Roy Fielding] > We are simply passing through the one and only defined i18n solution > for HTTP/1.1 because it was the only solution available in 1994. > If email clients can (and do) implement it, then so can WWW clients. > > People who want to fix that should start queueing for HTTP/1.2. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Python 3.0 and WSGI 1.0.
[Sylvain] > Would there be any interest in asking the HTTP-BIS working group [1] what > they think about it? > > Currently I couldn't find anything in their drafts suggesting they had > decided to clarify this issue from a protocol's perspective but they might > consider it to be relevant to their goals. > > - Sylvain > > [1] http://www.ietf.org/html.charters/httpbis-charter.html I checked the current version of their replacement for RFC 2616. It says """ 2.1.3. URI Comparison When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs """ Which doesn't work if the two URIs to be compared are in different encodings. I did find this page on the W3C site which at least explains the issues, and does a survey of existing modern browsers for how they encode URIs and IRIs. http://www.w3.org/International/articles/idn-and-iri/ """ Paths The conversion process for parts of the IRI relating to the path is already supported natively in the latest versions of IE7, Firefox, Opera, Safari and Google Chrome. It works in Internet Explorer 6 if the option in Tools>Internet Options>Advanced>Always send URLs as UTF-8 is turned on. This means that links in HTML, or addresses typed into the browser's address bar will be correctly converted in those user agents. It doesn't work out of the box for Firefox 2 (although you may obtain results if the IRI and the resource name are in the same encoding), but technically-aware users can turn on an option to support this (set network.standard-url.encode-utf8 to true in about:config). Whether or not the resource is found on the server, however, is a different question. If the file system is in UTF-8, there should be no problem. If not, and no mechanism is available to convert addresses from UTF-8 to the appropriate encoding, the request will fail. Files are normally exposed as UTF-8 by servers such as IIS and Apache 2 on Windows and Mac OS X. Unix and Linux users can store file names in UTF-8, or use the mod_fileiri module mentioned earlier. Version 1 of the Apache server doesn't yet expose filenames as UTF-8. You can run a basic check whether it works for your client and resource using this simple test. Note that, while the basics may work, there are other somewhat more complicated aspects of IRI support, such as handling of bidirectional text in Arabic or Hebrew, which may need some additional time for full implementation. """ Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI Open Space @ PyCon.
[Noah] > +1 on the iterator, although I might just like the idea and might be missing > something important. It seems like there are a lot of powerful things being > developed with generators in mind, and there are some nifty things you can > do with them like the contextlib example: > http://docs.python.org/library/contextlib.html#contextlib.closing Indeed, like coroutines. http://www.python.org/dev/peps/pep-0342/ [Robert] >> The counter-argument was that >> servers could use non-blocking sockets to allow apps which read() to >> yield in the case of no immediate data rather than block indefinitely. Ah, but the problem with that is that one can't magically suspend methods like that and return control to the scheduler, without using coroutines or stackless. Who does the read() method return control to when there's no data available (i.e. no bytes on the socket). If wsgi.input is a simple file-like object, then it's methods must be coded to recognise, rather than blocking, when the data is not yet available to fulfill the applications expectation. How does it know how to return control to the scheduler, instead of the application? If the application expects to receive all of the data that it asked for with a, say read(1024) call, it has to be prepared to accept that it may get less than 1024 bytes, in an asynchronous situation. What does it return to the application in the case when < 1024 bytes is available? >> If a file-like object were retained, it would help to publish a >> chainable file example to help middleware re-stream files they read any >> part of. I don't think that re-streaming of input should be a part of the spec; it's an application layer thing. We don't expect to re-stream the output of an application: why re-stream the input? If some application needs to examine the entire byte sequence for whatever reasons, that's a special case that can be catered for with itertools, and dedicated middleware. >> Continuing deferred issues >> * Lifecycle methods (start/stop/etc event API driven by the container) I'd really like to get this one nailed: java people and .net people expect this stuff. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Python 3.0 and WSGI 1.0.
Hi Bill, [Bill] > I think the controlling reference here is RFC 3875. I think the controlling references are RFC 2616, RFC 2396 and RFC 3987. RFC 2616, the HTTP 1.1 spec, punts on the question of character encoding for the request URI. RFC 2396, the URI spec, says """ It is expected that a systematic treatment of character encoding within URI will be developed as a future modification of this specification. """ RFC 3987 is that spec, for Internationalized Resource Identifiers. It says """ An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). """ and """ 1.2. Applicability IRIs are designed to be compatible with recommendations for new URI schemes [RFC2718]. The compatibility is provided by specifying a well-defined and deterministic mapping from the IRI character sequence to the functionally equivalent URI character sequence. Practical use of IRIs (or IRI references) in place of URIs (or URI references) depends on the following conditions being met: """ followed by """ c. The URI corresponding to the IRI in question has to encode original characters into octets using UTF-8. For new URI schemes, this is recommended in [RFC2718]. It can apply to a whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384], or the URN syntax [RFC2141]). It can apply to a specific part of a URI, such as the fragment identifier (e.g., [XPointer]). It can apply to a specific URI or part(s) thereof. For details, please see section 6.4. """ I think the question is "are people using IRIs in the wild"? If so, then we must decide how do we best deal with the problems of recognising iso-8859-1+rfc2037 versus utf-8, or whatever server-configured encoding the user has chosen. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Python 3.0 and WSGI 1.0.
Hi Graham, I think yours is a good solution to the problem. [Graham] > In other words, leave all the existing CGI variables to come through > as latin-1 decode As latin-1 or rfc-2047 decoded, to unicode. > and do anything new in 'wsgi' variable namespace, So the server provides "wsgi.server_decoded_SCRIPT_NAME" == u"whatever" "wsgi.server_decoded_PATH_INFO" == u"whatever" "wsgi.server_decode_charset" == u"utf-8" Just my €0,02. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] thoughts on an iterator
Hi all, It was great to meet (nearly) everybody at PyCon; I look forward to the next time. I particularly want to thank Robert for being so meticulous about recording and reporting the discussions; a necessary part of moving forward, IMO. [Robert] > H. Graham brought up chunked requests which I don't think have much > bearing on this issue--the server/app can't rely on the client-specified > chunk sizes either way (or you enable a Denial of Service attack). I > don't see much difference between the file approach and the iterator > approach, other than moving the read chunk size from the app (or more > likely, the cgi module) to the server. That may be what kills this > proposal: cgi.FieldStorage expects a file pointer and I doubt we want to > either rewrite the entire cgi module to support iterators, or re-package > the iterator up as a file. I recommend that any discussion of file-like vs. iterator for input should be informed by this discussion between myself and PJE back when the spec was being written. http://mail.python.org/pipermail/web-sig/2004-September/000885.html Most relevant quote [PJE] > Aha! There's the problem. The 'read()' protocol is what's wrong. If > 'wsgi.input' were an *iterator* instead of a file-like object, it would be > fairly straightforward for async servers to implement "would block" reads > as yielding empty strings. And, servers could actually support streaming > input via chunked encoding, because they could just yield blocks once > they've arrived. > > The downside to making 'wsgi.input' an iterator is that you lose control > over how much data to read at a time: the upstream server or middleware > determines how much data you get. But, it's quite possible to make a > buffering, file-like wrapper over such an iterator, if that's what you > really need, and your code is synchronous. (This will slightly increase > the coding burden for interfacing applications and frameworks that expect > to have a readable stream for CGI input.) For asynchronous code, you're > just going to invoke some sort of callback with each block, and it's the > callback's job to deal with it. > > What does everybody think? If combined with a "pause iterating me until > there's input data available" extension API, this would let the input > stream be non-blocking, and solve the chunked-encoding input issue all in > one change to the protocol. Or am I missing something here? http://mail.python.org/pipermail/web-sig/2004-September/000890.html I'd also be interested in the Twisted folk's take on that discussion. All the best, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] WSGI Open Space @ PyCon.
Dear all, For those of you at PyCon, there is a WSGI Open Space @ 5pm today (Friday). The sub-title of the open space is "Does WSGI need revision"? An example: Philip Jenvey (http://dunderboss.blogspot.com/) raised the need for something akin to what Java folks call "Lifecycle methods", so that WSGI apps can do initialization and finalization. http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets4.html I'm sure there are plenty of other topics that could be discussed as well. See you @5pm. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Use both Python and Javascript in html webpages
[David] > Can we use both Python and Javascript in html webpages? Any demo on this? If you're willing to write rpython, PyPy can compile it to javascript which run can in a browser. http://codespeak.net/pypy/dist/pypy/doc/js/using.html HTH, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification
[Graham] > I would be for (1) errata or amendment as reality is that there is > probably no WSGI implementation that disallows an argument to > readline() given that certain Python code such as cgi.FieldStorage > wouldn't work otherwise. > > For such a clarification on existing practice, I see no point in > having to change wsgi.version in environ as it would just cause > confusion. +1 [Graham] > I would also like to see other changes to WSGI specification but now > is not the time, let us at least though get this obvious issue with > API dealt with. After that we can then perhaps have a discussion of > future of WSGI specification and whether there really is any interest > in future versions with more significant changes. +1 Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Newline values in WSGI response header values.
[Graham] > Thus, is an embedded newline in value invalid? Would it be reasonable > for a WSGI adapter to flag it as an error? >From a security POV, it may be advisable for WSGI servers to *not* allow newlines in HTTP response headers; newlines in response headers may be the result of an application's failure to sanitise its inputs. http://en.wikipedia.org/wiki/HTTP_response_splitting Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
[Bob] > simplejson would give you an error and tell you exactly where the > problem was, Another good point. Other JSON modules should follow simplejson's lead, and provide access to the location in the document where the lexical or parse error occurred, so that the offending document can be opened in a text editor to determine the source of the problem, and perhaps fix it. This should also apply to "junk" after the document object, i.e. JSON expressions present in the document after the main document has been successfully parsed. A strict interpretation of the spec is that such "junk" is not permitted, and makes the JSON document broken, even though the main object representation is valid. Simplejson has an option for the user to control this, and jyson does too; I don't know about the others. [Bob] > but there isn't currently a non-strict mode and honestly > nobody has asked for it. If we only need "strict mode", then why do all of our parsers have options? Isn't "permissive mode" just a way of setting all of the parse options to liberal, in one go? Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
[Alan] >> [hand written JSON containing a] hard-to-spot dangling comma, from all the >> copying and pasting. That broke his javascript library; he solved the >> problem by passing it through a PHP JSON codec on his local Apache. It >> worked, i.e. his problem disappeared, but he didn't know why (the PHP >> lib had eliminated the dangling comma). Which all goes to confirm, >> IMHO, that you should be liberal in what you consume and strict in >> what you produce. [John] > Sounds like a case *for* strict parsing, in my opinion. PHP's loose > parsing made it difficult to figure out why the JSON was invalid. If > trailing comma handling is to try to work around copy-paste errors, -1 > from me. No, the PHP lib did exactly what it should, IMHO. The PHP lib was liberal in what it consumed (a dangling comma), and strict in what it produced (no dangling comma). It accepted my broken document with a dangling-comma, and emitted a strictly conformant document with the offending comma removed, which enabled my co-worker to proceed with his job. +1 from me. Other opinions? Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
[John] > I'm interested in whether you generally use JSON to communicate with a > JavaScript client, or another JSON library. Both the demjson and simplejson > libraries are written with the assumption that they are to be used to > interact with JavaScript. Answer #1: My motive is simply to implement the JSON spec, in a [j|p]ythonic way. If the ideal of JSON is to be realised, then the producer of the document is not relevant: it is only the document itself that matters. Answer #2: I'm working (i.e. day job) with JSON at the moment: a javascript client talking to a java server. The JS guy had a problem last week with a sample JSON document I gave him to prototype on. I wrote the sample by hand (it later became my freemarker template), and so inadvertently left in a hard-to-spot dangling comma, from all the copying and pasting. That broke his javascript library; he solved the problem by passing it through a PHP JSON codec on his local Apache. It worked, i.e. his problem disappeared, but he didn't know why (the PHP lib had eliminated the dangling comma). Which all goes to confirm, IMHO, that you should be liberal in what you consume and strict in what you produce. [John] > You mentioned in an earlier e-mail that jyson supports reading arrays with > trailing commas -- is this intentional, or accidental? Do you read them with > Python or JavaScript semantics? Went out of my way to accept them, with python semantics. Javascript semantics differ. Last time I tested, FireFox and IE interpreted "[1,2,3,]" differently as [1,2,3] and [1,2,3,null]. Although that may have changed during the meanwhilst. [Alan] > > 2. To have a native-code implementation, customised for jython. [John] > Did you encounter any particular issues related to implementing a JSON > library in Jython that would affect how a standard library implementation's > API should be designed? Jython is changing rapidly. It is evolving from a 2.2 stage ("from __future__ import generators") to a 2.5 stage in one leap. Jython 2.5 is built with java 1.5 (1.5 is where java grew annotations and generics). Between 2.2. and 2.5, python has grown Decimal's, generator comprehensions, decorators, context managers, bi-directional generators, etc. I prefer for a pure java implementation of a JSON codec to remain flexible in terms of the way that it maps "fundamental" JSON types into the jython type hierarchy and interpreter machinery[1]. I'm beginning to think that any putative JSON API should permit the user to specify which class will be used to instantiate JSON objects. If the users can specify their own classes, that might go a long way way resolve issues such as "I need my javascript client to communicate Numbers representing radians to my python server which uses Decimal because it works better with my geo-positioning library". Standard libraries should provide their own set of default instantiation classes, which the user could override. Regards, Alan. [1] There is an argument that a pure java JSON parser for jython is not worth the effort, in performance terms at least. JVM optimisation is very sophisticated these days, and it is conceivable that pure python (byte)code could run as fast or faster on a JVM than equivalent java code. Think PyPy. So maybe a single well-designed pure-python JSON module in the cpython standard library is the way to go. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc
[Alan] >> [2] Perhaps some pythonista from Web-SIG is most appropriate to advise >> how JSON-RPC should move forward? After all, we're more accustomed to >> server-side stuff than those javascript folks ;-) [Ian] > Let it die? It is more complicated than necessary, when instead you could > just make each function a URL of its own, and POST the arguments and get > back the response, with 500 Server Error for errors. It's hard to spec that > up because it's too simple. > > OHM (http://pythonpaste.org/ohm/) follows this model of exposing a service. Mmmm, very RESTful. Access to the requested HTTP method is a fundamental for RESTful services. I find it interesting that Java's HttpServletRequest has a .getMethod(), but no .setMethod(). Which means that one has to implement method overrides[1] by carrying the override value through means other than the request object itself. Whereas in WSGI, I can simply do: environ['REQUEST_METHOD'] = environ['HTTP-X-HTTP-METHOD-OVERRIDE'] I've heard WSGI described as "python's servlet API". It's not that; it's better. Regards, Alan. [1] http://code.google.com/apis/gdata/basics.html#Updating-an-entry ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc
[Ronny] >> since json-rpc and xml-rpc basically do the same >> and the only difference is the content-type (json is more concise), >> i propose to create a single xml/json-rpc module. [Graham] > The problem with the JSON-RPC 1.0 specification was that it wasn't > always as clear as could have been. > > Unfortunately the JSON-RPC 1.1 draft specification didn't necessarily > make things better. > The JSON-RPC 1.1 > specification was also never really completed and left out details > such as standard error codes etc that there were proposing be > specified. All valid concerns. I think that the JSON-RPC initiative lost its way a little. They tried to model things such as encoding and decoding an object graph, using object references, etc, which IMHO is a step too far for the usages JSON-RPC would get, and is more CORBA than XML-RPC. The maintainer of the JSON-RPC.org site was looking for someone to take it over for a while; I think someone might have taken it over last year. [Graham] > Are you > prepared to go and test it with a sufficient range of clients to make > sure Python implemented server side interops properly? Interestingly, the reference implementation for JSON-RPC is a server written in python[1]. http://json-rpc.org/wiki/python-json-rpc Perhaps python's best interests in this case are better served by letting that reference implementation drive the JSON-RPC standards process[2]? If that is the case, then it is counter-productive to add a competing module to the python standard library. Regards, Alan. [1] But it's a shame they didn't write it on WSGI: then their services could have run on the Google compute cloud ;-) [2] Perhaps some pythonista from Web-SIG is most appropriate to advise how JSON-RPC should move forward? After all, we're more accustomed to server-side stuff than those javascript folks ;-) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
[Deron] > (I just joined this list, so this reponse may not be threaded properly) [Bob] > I wasn't subscribed to the list at the time this came up, but I'm all > for getting simplejson into the stdlib. Well, it appears we have a quorum of JSON<->python codec writers, since I've written a jython module that I'd like to interoperate with cpython codecs. I think it's appropriate for any discussions of JSON to take place on the web-sig. I've been thinking about how to take this forward. I see two ways Formal approach Introduce a "Standards Track" Library PEP, which is designed for the purpose of bringing a new module through a full peer-review process and into the python standard library. (Which means we in jython and ironpython land should also then provide it). This would have the following outcomes - Result in a single JSON implementation going into the cpython standard library, possibly in Python 3000 - Expose the new module to full community review/bug-tracking/modification - Opportunity to thrash out all of the finer points of JSON<->python transcoding, including but not limited to - NaN, Infinity, etc - What is the most appropriate number/integer/float/double/decimal representation - Structural strictness, e.g. junk after document body, dangling commas, etc. - BMP support - Byte encoding detection - Python 3000 support - Standardise the interface, de facto However, this option is somewhat complicated by the fact that we seem to have TWO quality cpython implementations competing for a place in the cpython standard library. Also, I think the PEP process might be a little cumbersome for this topic, given that the PEP process involves commit rights to the cpython source tree (since the proposal for a new module should be accompanied by the source code of the proposed implementation). Informal approach = Develop and document a standard interface, and ensure that all of our modules support it. This interface would define method, class and exception names. Standard methods would probably "load" and "dump" objects, possibly creating "JSONEncoder"s and "JSONDecoder"s to do the job: "JSONException" and subclasses thereof would signify errors. Perhaps a standard mechanism to retrieve the location of errors, e.g. line and column, would be appropriate? Perhaps a standard set of feature/option names could be agreed, e.g. "accept_NaN", etc. User code written to this standard could move reasonably easily between implementations, or indeed between platforms. This approach has the benefits that - Authors are free to interpret edge cases as they see fit, and provide options. - Competing implementations can continue to improve in the field - Changing implementations could be as simple as using a different egg (Although an exhaustive set of test cases covering the required behaviour is recommended) We could call it PAJ, Python Api for Json, or some such. I feel the informal option is more appropriate. It could be effectively managed on a wiki page. Or perhaps a ticketing system (e.g. TRAC) would be good for tracking detailed discussions of JSON's many edge cases, etc. I would be willing to start a wiki page with details about a putative module interface. Finally, at this stage I think speed is less of a concern; correctness is more important for now. As Aahz is fond of quoting, "It is easier to optimize correct code than to correct optimized code". Thoughts? Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
[Massimo] > It would also be nice to have a common interface to all modules that > do serialization. For example pickle, cPickle, marshall has dumps, so > json should also have dumps. Indeed, this is my primary concern also. The reason is that I have a pure-java JSON codec for jython, that I will either publish separately or contribute to jython itself. If we're going to have the facility in both cpython and jython (and probably ironpython, etc), then it would be optimal to have a compatible API so that we have full interoperability. And given that we in jython land are always left implementing cpython APIs (which are not necessarily always the optimal design for jython) it would be nice if we could agree on APIs, etc, *before* stuff goes into the standard library. The API for my codec is slightly different from simplejson, although it could be made the same with a little work, including exception signatures, etc. But there are some things about my own design that I like. For example, simplejson allows override of the JSON output representing certain objects, by the use of subclasses of JSONEncoder. My design does it differently; it simply looks for a "__json__()" callable on every object being serialised, and if found, calls it and uses its return value to represent the object. I have no equivalent of simplejson's decoding extensions. Another difference is the set of options. Simplejson has options to control parsing and generation, and so does mine. But the sets of options are different, e.g. simplejson has no option to permit/reject dangling commas (e.g. "[1,2,3,]")*, whereas mine has no support for accepting NaN, infinity, etc, etc. On the encoding side, I simply make the assumption that all character transcoding has happened before the JSON text reaches the JSON parser. (I think this is a reasonable assumption, given that byte streams are always associated with file storage, network transmission, etc, and only the programmer has access to the relevant encoding information). But given that RFC 4627 specifies how to guess encoding of JSON byte streams, I'll probably change that policy. Lastly, another area of potential cooperation is testing: I have over 100 unit-tests, with fairly extensive coverage. I think that test coverage is very important in the case of JSON; you can never have too many tests. So, what is the best way to go about agreeing on the best API? 1. Discussion on web-sig? 2. Discussion on stdlib-sig? 3. Collaborative authoring/discussion on a WIKI page? 4. Regards, Alan. * Which can mean different things to different software. Some javascript interpreters interpret it as a 4 element list (inferring the last object between the comma and the closing square bracket as a null) , others as a 3 element list. Python obviously interprets it as a 3-element list. So the general internet maxim "be liberal in what you accept and strict in what produce" applies. My API gives control of this strictness/relaxedness to the user. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Time a for JSON parser in the standard library?
[Graham] > The problem areas were, different interpretations of what could be > supplied in an error response. Whether an integer, string or arbitrary > object could be supplied as the id attribute in a request. Finally, > some JavaScript clients would only work with a server side > implementation which provided introspection methods as they would > dynamically create a JavaScript proxy object based on a call of the > introspection methods. These are JSON-RPC concerns, and nothing to do with JSON text de/serialization. I do believe we're only discussing JSON<->python objects transformation, in this thread at least. > Unfortunately the JSON 1.1 draft specification didn't necessarily make > things better. There is no JSON 1.1 spec; but there is a JSON-RPC 1.1 spec. http://json-rpc.org/wiki/specification > Thus my question is, what version of the JSON specification are you > intending to support. The one specified in RFC 4627 http://www.ietf.org/rfc/rfc4627.txt Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Time a for JSON parser in the standard library?
Dear all, Given that 1. Python comes with "batteries included" 2. There is a standard library re-org happening because of Py3K 3. JSON is now a very commonly used format on the web Is it time there was a JSON codec included in the python standard library? (If XML is already supported, I see no reason why JSON shouldn't be) Or is it best to make users who want to use JSON go and research all of the different options available to them? Choosing a Python JSON Translator http://blog.hill-street.net/?p=7 Just a thought. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI, Python 3 and Unicode
[Alan] >> The restriction to iso-8859-1 is really a distraction; iso-8859-1 is >> used simply as an identity encoding that also enforces that all >> "bytes" in the string have a value from 0x00 to 0xff, so that they are >> suitable for byte-oriented IO. So, in output terms at least, WSGI *is* >> a byte-oriented protocol. The problem is the python-the-language >> didn't have support for bytes at the time WSGI was designed. [Thomas] > If you're talking about the "output stream", then yes, it's all about > bytes (or should be). Indeed, I was only talking about output, specifically the response body. > But at the status and headers level, HTTP/1.1 is > fundamentally ISO-8859-1-encoded. Agreed. That is why the WSGI spec also states """ Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding. """ So in order to use non-ISO-8859-1 characters in response status strings or headers, you must use RFC 2047. As confirmed by the links you posted, this is a HTTP restriction, not a WSGI restriction. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI, Python 3 and Unicode
[Phillip] >> WSGI already copes, actually. Note that Jython and IronPython have >> this issue today, and see: >> >> http://www.python.org/dev/peps/pep-0333/#unicode-issues [James] > It would seem very odd, however, for WSGI/python3 to use strings- > restricted-to-0xFF for network I/O while everywhere else in python3 is > going to use bytes for the same purpose. I think it's worth pointing out the reason for the current restriction to iso-8859-1 is *because* python did not have a bytes type at the time the WSGI spec was drawn up. IIRC, the bytes type had not yet even been proposed for Py3K. Cpython effectively held all byte sequences as strings, a paradigm which is (still) followed by jython (not sure about ironpython). The restriction to iso-8859-1 is really a distraction; iso-8859-1 is used simply as an identity encoding that also enforces that all "bytes" in the string have a value from 0x00 to 0xff, so that they are suitable for byte-oriented IO. So, in output terms at least, WSGI *is* a byte-oriented protocol. The problem is the python-the-language didn't have support for bytes at the time WSGI was designed. [James] > You'd have to modify your app > to call write(unicodetext.encode('utf-8').decode('latin-1')) or so Did you mean: write(unicodetext.encode('utf-8').encode('latin-1'))? Either way, the second encode is not required; write(unicodetext.encode('utf-8')) is sufficient, since it will generate a byte-sequence(string) which will (actually "should": see (*) note below) pass the following test. try: wsgi_response_data.encode('iso-8859-1') except UnicodeError: # Illegal WSGI response data! On a side note, it's worth noting that Philip Jenvey's excellent rework of the jython IO subsystem to use java.nio is fundamentally byte oriented. http://www.nabble.com/fileno-support-is-not-in-jython.-Reason--t4750734.html http://fisheye3.cenqua.com/browse/jython/trunk/jython/src/org/python/core/io Because it is based on the new IO design for Python 3K, as described in PEP 3116 http://www.python.org/dev/peps/pep-3116/ Regards, Alan. [*] Although I notice that cpython 2.5, for a reason I don't fully understand, fails this particular encoding sequence. (Maybe it's to do with the possibility that the result of an encode operation is no longer an encodable string?) Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> response = u"interferon-gamma (IFN-\u03b3) responses in cattle" >>> response.encode('utf-8').encode('latin-1') Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 22: ordinal not in range(128) >>> Meaning that to enforce the WSGI iso-8859-1 convention on cpython 2.5, you would have to carry out this rigmarole >>> response.encode('utf-8').decode('latin-1').encode('latin-1') 'interferon-gamma (IFN-\xce\xb3) responses in cattle' >>> Perhaps this behaviour is an artifact of the cpython implementation? Whereas jython passes it just fine (and correctly, IMHO) Jython 2.2.1 on java1.4.2_15 Type "copyright", "credits" or "license" for more information. >>> response = u"interferon-gamma (IFN-\u03b3) responses in cattle" >>> response.encode('utf-8') 'interferon-gamma (IFN-\xCE\xB3) responses in cattle' >>> response.encode('utf-8').encode('latin-1') 'interferon-gamma (IFN-\xCE\xB3) responses in cattle' >>> ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Modjy and jython 2.2.
Dear all, Now that jython 2.2 has been released (hooray!) http://www.jython.org/Project/download.html it's time for a quick update on the status of modjy, the jython WSGI/J2EE gateway. http://www.xhaus.com/modjy/ Previous versions of modjy were based on jython 2.1, which didn't have support for the iterator protocol. However, the new jython 2.2 has full iterator and generator support, and so is capable of full WSGI support (round of applause for the hard work of the jython-dev team). In a testament to the stability of jython and the clean design of WSGI, the modjy code has not changed; the original jython 2.1 version of modjy works seamlessly with jython 2.2, unmodified. Still, I am making an interim release, for two purposes 1. To fix a longstanding bug in the implementation 2. To explicitly mention jython 2.2 in the documentation I'm off on vacation soon, and wanted to make this small "publicity release" before I go. When I return, I will be making the following modifications 1. Adding a full test suite, based on MockRunner, the mock Java Servlet framework. 2. Improving J2EE resource handling 3. Improving import handling 4. Various small improvements and documentation updates. All the best, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Web Site Process Bus
[Graham Dumpleton] > First comment is about WSGI applications somehow themselves using > SIGTERM etc as triggers for things they want to do. For Apache at > least, allowing any part of a hosted Python application to register > its own signal handlers is a big no no. This is because Apache itself > uses a whole range of signals to manage such tasks as shutting down > sub processes or signaling worker and/or listener threads within a > process that its time to wakeup or shutdown. If a WSGI application > starts registering signal handlers it can as a result stop Apache from > even being able to process requests. In mod_wsgi I have had to > specifically take steps to prevent applications breaking things in > this way by replacing signal.signal() on creation of an interpreter. > Instead I log a warning that the signal registration has been ignored > and otherwise do nothing. This was simply the safest thing to do. > > Thus I believe a clear statement should be made that UNIX signals are > off limits to WSGI applications or components. From a jython POV, I agree with this statement; signals don't even exist on java/jython (although some JVMs have non-standard extensions for signals). Thus, any "standard" involving signals would not be implementable on jython, and I guess ironpython too. [Graham Dumpleton] > Anyway, just wanted to make it absolutely clear that I don't believe a > hosted WSGI application and associated framework has any business > taking direct interest in low level UNIX signals. Agreed. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Direct use of sys.stdout, sys.stderr and sys.stdin in WSGI application.
[Alan Kennedy] >>Strictly speaking, WSGI requires python 2.2, >>because of iterators. [Phillip J. Eby] > Actually, it doesn't. The pre-2.2 iterator protocol is to be used in such > cases: > > http://www.python.org/dev/peps/pep-0333/#supporting-older-2-2-versions-of-python Dang! I knew I couldn't say anything on web-sig without being contradicted ;-) I am familiar with that section. I'm sure you remember writing this in the credits section: "Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython (well before the spec was finalized) helped to shape the "supporting older versions of Python" section". But if the users want their "modern" python applications to be portable everywhere on WSGI, e.g. returning (iterable) files as ouput, or generators, then they should really stick with 2.2+. But you are, of course, right about the pre-2.2 iterator protocol. I wrote modjy for jython 2.1 according to the PEP guidelines, and have had user reports that it works without modification on jython 2.2+. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Direct use of sys.stdout, sys.stderr and sys.stdin in WSGI application.
Graham, I thought I'd reply, so that we'd get replies from everyone else to tell me I'm wrong. All your points are good common-sense stuff. I think that all of your policies on stdin, stdout, and stderr are good, and are appropriate for a WSGI environment running inside an Apache server. Some small points. > . one could actually write to sys.stdout directly as > well since that is where the WSGI adapter writes it to anyway. I think it's a good idea to redirect stdout, and to document in your server/gateway documentation that you are doing so. I also think this is a server specific issue. > Anyway, it may seem good practice for a WSGI adapter to still prevent > use of sys.stdin unless configured explicitly to allow it and even > then it might only allow it if the server is running in a mode whereby > it would work. This should be a server-specific feature, that is documented. > Finally, sys.stderr also presents problems of its own. Although > wsgi.errors is provided with the request environment, this can't be > used at global scope within a module when importing and also shouldn't > be used beyond the life time of the specific request. Thus, there > isn't a way to log stuff outside of a request and ensure it gets to > the server log. One could try and mandate use of 'logging' module, but > this isn't available in old versions of Python. I don't think you need to worry about versions of python that don't have the logging module. Strictly speaking, WSGI requires python 2.2, because of iterators. So I think it's extremely unlikely that people will be running WSGI apps on pre-2.2 VMs. > What you need is for sys.stderr to be underlayed with thread > specific log objects each with its own buffering mechanism that > ensures that only complete lines of text get sent to the actual log > file. This is a server/gateway implementation detail. > Yes one could simply ignore the whole issue, but I feel that a good > quality WSGI adapter/server should address these issues and either > lock things down as appropriate to protect users from themselves or > ensure that using them results in a sensible outcome. Given how much talk there is of the WSGI "environment", I think it's good to raise these issues. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.
[Graham Dumpleton] > Should a WSGI adapter for a web server which allows a mount point to > have a trailing slash specifically flag as a configuration error an > attempt to use such a mount point given that it appears to be > incompatible with WSGI? OK, I'll have a go. I think the question boils down to the following: Assume an application mount point of "/application". If a request is received for /application Then it will (and should) be redirected to the URL /application/ Is that new URL to be interpreted as SCRIPT_NAME: /application PATH_INFO: / or interpreted as SCRIPT_NAME: /application/ PATH_INFO: I think that the WSGI interpretation is the first interpretation, and the correct one, because it gives correct results when deriving relative URLs for resources contained within the application. Is that addressing the question? [Graham Dumpleton] > It therefore seems that the idea of the mount point for an > application having a trailing slash may be incompatible > with WSGI. Can this be considered to be the case or is there > some other way one is meant to deal with this? I don't know about "incompatible", although it obviously creates the double-slash problem with computed URLs. Perhaps the Apache "policy" on this issue is influenced by its origins as a http server for serving hierarchies of directories and files from a filesystem? When it comes to CGI though, Apache does the right thing and passes SCRIPT_NAME: /application PATH_INFO: / to CGI scripts. I don't know if this provides any insight into whether or not mounting applications with a trailing slash is an error. Does that help at all? Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI input filter that changes content length.
[Graham] > Hmmm, maybe I should have phrased my question a bit differently as to be > honest I am not actually interested in doing on the fly decompression and > only used it as an example. I really only want to know about how the > content length is supposed to be dealt with. I didn't want to explain the > actual context for the question as didn't want to let on yet to what I am up > to, so used an example which I thought would illustrate the problem. Point taken. But I think gzip encoding is a good example to illustrate the issues. [Graham] > If I leave the > content length header as is and any application does a > read(content_length) and decompression or some other input filter > actually results in more data than that being available, the application > will not get it all as it has only asked to read the original length > before decompression. So obviously the Content-Length header cannot be left unmodified if some transformation is in place that is altering the length of the content. There are two choices for how the wrapping should happen. 1. The ungzipping filter reads the entirety of the (possibly huge) input, decompresses it, and makes it available in wsgi.input. The Content-Length header is rewritten to reflect the length of the decompressed content. The client has a valid Content-Length value, but the server has had to buffer a potentially large input stream in order to be able to provide that. 2. The ungzipping filter wraps the compressed stream, and decompresses on demand and on-the-fly. In this case, it *must* delete the old Content-Length header, which is now invalid. It cannot provide a new value for Content-Length, since the final uncompressed length of the input stream cannot be known. [Graham] > The PEP says that an application though should not attempt to read more > data than has been specified by the content length. If it is common > practice that applications take this literally and always get data from > the input by using read(content_length) then there is a requirement that > the content length header must exist. Thus, if the input filter does zap > the content length header and remove it then an application which does > that will not work. Then I suppose that that application is not a fully-compliant WSGI application. Scenario 2 outlined above is a perfectly valid scenario that can happen, so an application that cannot deal with that scenario is not robust. > Thus the question probably is, what is accepted practice or what does > the PEP dictate as to how applications should use read()? AFAICT, the PEP is not prescriptive about the use of the wsgi.input.read() method. However, given that you have found it necessary to raise the question, perhaps it should be added to the WSGI PEP that absence of a Content-Length header does NOT imply absence of content. [Graham] > So, is it okay to remove the content length header when there is actually > data and I know it wouldn't actually be correct, I would say it's compulsory to remove the header: it contains an incorrect value, and if the application uses that value, it will get unexpected data or an exception, and rightly so. [Graham] > or does that result in a > situation that is seen as violating the PEP or even if acceptable would break > existing WSGI applications. I would say that leaving an incorrect value in place should be a violation of the PEP. > Or in short, is it mandatory that content length header must exist if there is > non zero length data in input? I know the PEP says that the content length > may be empty or absent, but am concerned that applications would assume > it has value of 0 if empty or absent. No, the Content-Length header is optional, and any applications that operate otherwise are non-compliant. [Alan] >> 6. Exactly the same principles should apply to decoding incoming >> Transfer-Encoding: chunked. [Graham] > My understanding is that content encoding is different to transfer encoding, > ie., is not hop by hop in this sense and that the same statements don't apply. Hop-by-hop header means that the attribute described in the header is not an inherent attribute of the content being transferred, but is solely used in one stage of a multi-hop communication. If my browser is using a proxy, which relays requests on to a server, the proxy may decide to use Transfer-Encoding to communicate with the server. Thus the Transfer-Encoding only applies to the proxy->server "hop". If the server receives such a Transfer-Encoding, it *must* decode the content according to that Transfer-Encoding before making it available to the application. [Graham] > Wait till you see what I am about to come out with if I can sort this issue > out. :-) Now I'm intrigued :-) Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI input filter that changes content length.
[Graham Dumpleton] > How does one implement in WSGI an input filter that manipulates the request > body in such a way that the effective content length would be changed? > The problem I am trying to address here is how one might implement using WSGI > a > decompression filter for the body of a request. Ie., where "Content-Encoding: > gzip" has been specified. > So, how is one meant to deal with this in WSGI? The usual approach to modifying something something in the WSGI environment, in this case the wsgi.input file-like object, is to wrap it or replace it with an object that behaves as desired. In this case, the approach I would take would be to wrap the wsgi.input object with a gzip.GzipFile object, which should only read the input stream data on demand. The code would look like this import gzip wsgi_env['wsgi.input'] = gzip.GzipFile(wsgi_env['wsgi.input']) Notes. 1. The application should be completely unaware that it is dealing with a compressed stream: it simply reads from wsgi.input, unaware that reading from what it thinks the input stream is actually causing cascading reads down a series of file-like objects. 2. The GzipFile object will decompress on the fly, meaning that it will only read from the wrapped input stream when it needs input. Which means that if the application does not read data from wsgi.input, then no data will be read from the client connection. 3. The GzipFile should not be responsible for enforcement of the incoming Content-Length boundary. Instead, this should be enforced by the original server-provided file-like input stream that it wraps. So if the application attempts to read past Content-Length bytes, the server-provided input stream "is allowed to simulate an end-of-file condition". Which would cause the GzipFile to return an EOF to the application, or possibly an exception. 4. Because of the on-the-fly nature of the GzipFile decompression, it would not be possible to provide a meaningful Content-Length value to the application. To do so would require buffering and decompressing the entire input data stream. But the application should still be able to operate without knowing Content-Length. 5. The wrapping can NOT be done in middleware. PEP 333, Section "Other HTTP Features" has this to say: "WSGI applications must not generate any "hop-by-hop" headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming "hop-by-hop" headers in the environ dictionary. WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable." So the wrapping and replacement of wsgi.input should happen in the server or gateway, NOT in middleware. 6. Exactly the same principles should apply to decoding incoming Transfer-Encoding: chunked. HTH, Alan. P.S. Thanks for all your great work on mod_python Graham! ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [Fwd: Summer of Code preparation]
[Peter Hunt] >> I think an interesting project would be complete integration of the >> client and server via AJAX. That is, whenever a DHTML event handler >> needs to be called on the client-side, the document state is serialized >> and it is sent along with the DHTML event information to the server, >> informing it that an event occured. [Matt Goodall] > Invoking something server-side every time there's some (interesting) > event in the browser will almost certainly perform badly due to network > latency and possibly put unnecessary load on the server. I was going to refrain from this conversation, but now find the following point relevant: How long before we end up reinventing X-windows-style transmission of UI events across the network, i.e. by sending all browser events over HTTP to the server? It's worth noting that, in the early days of X-windows, people said it was far too heavyweight, and would saturate networks and quickly become unusable. But those people reckoned without advances in network technology, and the X-windows people claimed that they were specifically designing for network technologies from several years in the future, by which time their software technology would be mature and ready to take advantage of the newer and higher bandwidths. And they were pretty much right: having used X-windows over corporate WANs since the early 1990s, I think it works pretty well. But the X-windows people weren't designing for Internet scale: how many connections should a server be able to handle? > Serializing and sending document state will only make it slower. Agreed: serialising and transmitting whole documents is taking it too far ;-) Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [Fwd: Summer of Code preparation]
[Titus Brown] > I'm thinking of proposing a project to build a JavaScript interpreter > interface for Python; the goal (for me) is to get twill/mechanize to > understand JavaScript. I think the project has wider applications, > but I'm not sure what people actually want to do with JavaScript. > I could imagine server-side parsing of javascript, and/or integration of > javascript and python code. Thoughts? Have you looked at WebCleaner? WebCleaner is a filtering HTTP proxy, written in python. http://webcleaner.sourceforge.net/ WebCleaner uses the Mozilla SpiderMonkey javascript engine to execute JS from web pages: From the webcleaner front page """ Another feature is the JavaScript filtering: JavaScript data is executed in the integrated Spidermonkey JavaScript engine which is also used by the Mozilla browser suite. This eliminates all JavaScript obfuscation, popups, and document.write() stuff, but the other JavaScript functions still work as usual. """ Perhaps webcleaner has code that already does what you need? Although the GPL licensing might be problematic. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Standalone WSGI form framework.
[Alan Kennedy] > But I'm tired of hacking on it to make it do what I want: I'd much > prefer to start afresh with my own design than to continue to use > Quixote: it's just too limiting. [Titus Brown] > I think you mistook my question for a criticism ;). Rewrite or no, I'm > mostly interested in what you meant by "WSGI oriented" and what that > would mean specifically in the context of the Quixote forms lib. No criticism detected ;-) By WSGI oriented, I mean that I don't have to mock request objects: I can just use a dictionary to mock a WSGI request: I've found that testing approach exceedingly straightforward to work with. Also, I've had problems in the past with Quixote not handling response encodings correctly. And it's html escaping mechanism is excessively PTL oriented: I ended up making too many changes to Quixote, which made me question why I was using it in the first place. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Standalone WSGI form framework.
[Alan Kennedy] > I'm looking for a framework-independent form library. I'm using the > Quixote forms library at the moment, inside my own framework, but > would ideally like something more WSGI oriented, so that it is easier > to mock and unittest. [Titus Brown] > I'm confused by this -- this could mean that you want to separate the > quixote forms lib from the Quixote 'request' object, I guess. What > else? Hi Titus, I realise that I can rewrite the Quixote form lib to achieve what I want, but at the cost of a fairly significant effort. As it is, I've rewritten the rendering, to work with Kid and ElementTree. But I'm tired of hacking on it to make it do what I want: I'd much prefer to start afresh with my own design than to continue to use Quixote: it's just too limiting. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Standalone WSGI form framework.
[Alan Kennedy] > I'm looking for a framework-independent form library. I'm using the > Quixote forms library at the moment, inside my own framework, but > would ideally like something more WSGI oriented, so that it is easier > to mock and unittest. [Daniel Miller] > Have you looked at Ian Bicking's FormEncode? I'm not sure if it > meets all your requirements, but it seems like a good base to start > with (most of the hard stuff has already been done). Thanks Daniel. Indeed, it not only appears that FormEncode is the closest thing to what I need, it also seems to be the only show in town, i.e. the only framework-independent form library. [Alan Kennedy] > If anyone is familiar with the Java Spring Framework, it's got pretty > much everything I need, but is overly complex, and is written in Java [Daniel Miller] > I wrote an app using Spring and I have to say it's the best web > framework I've ever used in terms of completeness and flexibility, > but it's written in Java... Agreed. I find it's interface based design very simple and powerful. But, IMHO, the actual implementations of the classes that implement the interfaces are excessively complex and rigidly structured. [Daniel Miller] > I actually wrote a few simple classes on top of CherryPy that exposes > the Spring webmvc Controller interface as well as the > SimpleFormController class (those are the two main building blocks > I found most useful in Spring's WebMVC). My SimpleFormController > implementation uses FormEncode for validation. I'd be willing to > share the code if you're interested. I'd be very interested to see that, and potentially use it, if you're willing ... [Daniel Miller] > I think "the one true web framework" could be made for Python if > someone took the best ideas from Spring WebMVC and made a few > component-ized building blocks on top of which complex and widely > varied applications could be built. Completely agreed. The term "meta-framework" is most appropriate, I think. If we could agree on a set of interfaces, then everyone would be free to contribute implementations of their own componments. For example, I like the idea of Routes URL-mapping library: it's precisely the kind of task that is simple enough in concept, but yet complex enough to require a dedicated (and thoroughly tested) library. Most of the popular web frameworks make the fundamental mistake of picking a single URL->object mapping mechanism, and making you shoehorn all your requirements into it. IIRC, Django, Turbogears, Pylons, all make this mistake. However, if URL->object mapping were controlled by an interface, then we'd be free to choose from multiple implementations, e.g. routes-style, quixote-style, zope-style, etc, etc. > However, to make this possible we'd most likely need a standard > request object (or at least an interface definition). ISTM that WSGI eliminates the need for that. Is there any specific thing you have in mind that WSGI doesn't cover? Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Standalone WSGI form framework.
Greetings All. I'm looking for a framework-independent form library. I'm using the Quixote forms library at the moment, inside my own framework, but would ideally like something more WSGI oriented, so that it is easier to mock and unittest. My ideal form framework should do the following 1. Parsing of submitted POST requests, etc 2. Binding of incoming form variables to the attributes of a target python data object 3. Customisable validation, with management of validation error messages. 4. Generate unique (hierarchical) field names for sub-attributes of the data object to be edited, which are javascript-identifier-safe, i.e. can be used as the names of HTML form input elements. 5. Handle multipart/form-data 6. Nice-to-have: transparently handle multi-page forms, e.g. hub forms, etc. It should NOT 1. Attempt to generate HTML, or be tied to a specific templating mechanism If anyone is familiar with the Java Spring Framework, it's got pretty much everything I need, but is overly complex, and is written in Java :-( TIA, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Alan Kennedy] >>>Maybe we need a PEP [Bill Janssen] >>Great idea! That's exactly what I thought when I organized this SIG a >>couple of years ago. [Guido van Rossum] > At first I was going to respond "+1". But the fact that a couple of > years haven't led to much suggests that it's unlikely to be fruitful; > there just are too many diverging ideas on what is right. (Which makes > sense since it's a huge and fast developing field.) Having considered the area for a couple of days, I think you're right: the generic concept "web", as in web-sig, covers far too much ground, and there are too many schools of thought. > So unless someone (Alan Kennedy?) actually puts forward a PEP and gets > it through a review of the major players on web-sig, I'm skeptical. But there is a subset which I think is achievable, namely http support, which IMO is the subset that most needs a rework. And now that we have a nice web standard, WSGI, it would be nice to make use of it to refactor the current http support. The following are important omissions in the current stdlib. - Asynchronous http client/server support (use asyncore? twisted?) - SSL support in threaded http servers - Asynchronous SSL support - Simple client file upload support - HTTP header parsing support, e.g. language codes, quality lists, etc - Simple object publishing framework? Addressing all of the above would be significant piece of work. And IMHO, it is only achievable by staying focussed on http and NOT addressing requirements such as - Content processing, e.g. html tidy, html parsing, css parsing - Foreign script language parsing or execution - Page templating API I think it would be a good idea to address these concerns in separate PEPs. [Guido van Rossum] > I certainly don't want this potential effort to keep us from adding > the low-hanging fruit (wsgiref, with perhaps some tweaks as PJE can > manage based on recent feedback here) to the 2.5 stdlib. Completely agreed. Any web-related PEPs are going to take a long time, and are unlikely to be ready in time for 2.5. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Guido Van Rossum] > Actually BaseHTTPServer.py and friends use a deprecated naming scheme > -- just as StringIO, UserDict and many other fine standard library > modules. > If you read PEP 8, the current best practice is for module names to be > all-lowercase and *different* from the class name. [Clark C Evans] > I propose we add wsgiref, but look at other implementations and > steal what ever you can from them. This is not a huge chunk of > code -- no reason why you can't have the best combination of > features and correctness. [Jean Paul Calderone] > HTTPS is orthogonal. Besides, how would you support it in the stdlib? It's > currently not > possible to write an SSL server in Python without a > third-party library. Maybe someone > would be interested in rectifying /that/? :) [Ian Bicking] > I've used this several times (well, not wsgiref's implementation, but > paste.response.HeaderDict). rfc822 is heavier than this dictionary-like > object, and apparently is also deprecated. [Alan Kennedy] > While we're on the subject, can we find a better home for the HTTP > status codes->messages mapping? Folks, Thinking about this some more, it's beginning to sound to me like the server-side web support in the standard library needs a proper review and possible rework: it's slowly decohering/kipplizing. Maybe we need a PEP, so that we can all discuss the subject (rationally ;-) and sort out all of the issues before we go ahead and commit anything? Just a thought. Feel free to dis-regard Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Ian Bicking] > Anyway, I'm +1 on the object [wsgiref's wsgi header manipulation class] > going somewhere. I don't know if the > parent package has to be named "wsgi" -- and "wsgiref" seems even > stranger to me, as anything in the standard library isn't a "reference > implementation" anymore, but an actual implementation. I personally > like a package name like "web". Everyone will know what that means > (though it would start with most of the web related modules not in it, > which is a problem). While we're on the subject, can we find a better home for the HTTP status codes->messages mapping? Integer status codes. http://mail.python.org/pipermail/web-sig/2004-September/000764.html Adding status code constants to httplib http://mail.python.org/pipermail/web-sig/2004-September/000842.html Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Robert Brewer] > Look at the right code and see if your gut feeling changes. ;) I looked at http://svn.cherrypy.org/trunk/cherrypy/_cphttpserver.py As indicated by Ian in this message http://mail.python.org/pipermail/web-sig/2006-February/002074.html Sorry if that was the wrong one to look at, I'm not at all familiar with CherryPy. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Alan Kennedy] >>Priority #1: Make the requisite server a single standalone module. [Guido van Rossum] > Huh? What makes you think this? My bad :-( Two things made me think like that 1. BaseHttpServer -> BaseHttpServer.py SimpleHttpServer -> SimpleHttpServer.py WSGIHttpServer -> WSGIHttpServer.py 2. The comment was more aimed at the CherryPy entry, which imports a fair amount of CherryPy support code. i'll-get-me-coat-ly'yrs, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Guido van Rossum] Let's make it so. I propose to add wsgiref to the standard library and nothing more. [Blake Winton] >>>Will you be maintaining this? ;) [Guido van Rossum] >>I'd expect we could twist Phillip's arm to maintain it; he's not >>expecting much maintenance. [Phillip J. Eby] > Yes, and yes. Whew! :-) Phillip: Hope you don't mind me taking the liberty of rearranging your code? And before we go finalising anything, please let's give the other contenders a chance to come up with something competitive. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Alan Kennedy] 3. If I had to pick one of the 3 you suggested, I'd pick the last one, i.e. PJE's, because it fulfills exactly the criteria I listed [Robert Brewer] I have to disagree (having examined/unraveled it quite a bit recently, to remove modpython_gateway's dependency on it). [Ian Bicking] I think it also tries to enforce a lot of the details of WSGI, and thus guide a WSGI implementor into creating a compliant server. Well, I'm sure we all want our favourite server in the stdlib ;-) But a few things have to happen first. Priority #1: Make the requisite server a single standalone module. Anticipating PJE's willingness to have WSGIRef included in the stdlib, I've taken the liberty of putting it all into one big file. And I think it looks pretty damn good: fully WSGI compliant, with code to represent every single aspect of the spec. Take a look for yourself: the file is attached. If the attachment doesn't make it to the list, I'll upload it somewhere. But that doesn't mean the decision's over. It means that the bar has been raised. Anyone else who wants their module to be a contender has to get it all into the one file, i.e. eliminating all framework dependencies, etc. Here's a few comments I put together about the three contenders that have been proposed so far. They're just my own comments from reading the code: feel free to treat them as the ravings of a madman if you so wish. 1. CherryPy server - 407 lines (non-code lines: ~80) - Depends on cherrypy, cherryp._cputil, cherryp.lib.httptools - Depends on cherrypy.config - Implements HTTP header length limit checking - Implements HTTP body length limit checking - Uses own logging handler - Subclasses SocketServer.BaseServer, not BaseHTTPServer.HTTPServer - Therefore does low-level socket mucking-about - Provides 2 server implementations - CherryHTTPServer - PooledThreadServer - Explicitly checks for KeyboardInterrupt exceptions - PooledThreadServer has clean shutdown through Queue.Queue messaging - Does not detect hop-by-hop headers - No demo application My gut feeling: too complex, works to hard to be "production-ready", at the expense of readability. 2. Paste Server - 450 lines - Supports 100 continue responses - No imports from outside stdlib - Provides HTTPS/SSL server, with fallback if no SSL - Supports socket timeout - Demo application is (imported) paste.wsgilib.dump_environ - Does not detect hop-by-hop headers My gut feeling: Ignores many parts of the WSGI spec (sendfile, strict error checking), supports unnecessary stuff for stdlib, i.e. Continue support, HTTPS. 3. WSGIRef_onefile.py - 660 lines - No imports from outside stdlib - Detects hop-by-hop headers - Has WSGI sendfile support - Has dedicated class to manage WSGI headers list as dictionary - Has builtin demo app My gut feeling: WSGIRef is the sweetspot in terms of simplicity vs. usability. Covers all aspects of WSGI (which is what it was designed for, IIRC ;-) The ball's in yizzir court now.. Alan. """BaseHTTPServer that implements the Python WSGI protocol (PEP 333, rev 1.21) This is both an example of how WSGI can be implemented, and a basis for running simple web applications on a local machine, such as might be done when testing or debugging an application. It has not been reviewed for security issues, however, and we strongly recommend that you use a "real" web server for production use. For example usage, see the 'if __name__=="__main__"' block at the end of the module. See also the BaseHTTPServer module docs for other API information. """ from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer import urllib, sys, os, mimetools, types, time __version__ = "0.1" __all__ = ['WSGIServer','WSGIRequestHandler','demo_app'] server_version = "WSGIServer/" + __version__ sys_version = "Python/" + sys.version.split()[0] software_version = server_version + ' ' + sys_version hop_by_hop_headers = { 'connection':1, 'keep-alive':1, 'proxy-authenticate':1, 'proxy-authorization':1, 'te':1, 'trailers':1, 'transfer-encoding':1, 'upgrade':1 } def is_hop_by_hop(header_name): """Return true if 'header_name' is an HTTP/1.1 "Hop-by-Hop" header""" return hop_by_hop_headers.has_key(header_name.lower()) class FileWrapper: """Wrapper to convert file-like objects to iterables""" def __init__(self, filelike, blocksize=8192): self.filelike = filelike self.blocksize = blocksize if hasattr(filelike,'close'): self.close = filelike.close def __g
Re: [Web-SIG] WSGI in standard library
[Ian Bicking] > Note that the scope of a WSGI server is very very limited. It is quite > distinct from an XMLRPC server from that perspective -- an XMLRPC server > actually *does* something. A WSGI server does nothing but delegate. and > I'm not set on "production" quality code, but I think the general > sentiment against that is entirely premature. The implementations > brought up -- CherryPy's > (http://svn.cherrypy.org/trunk/cherrypy/_cphttpserver.py) and Paste's > (http://svn.pythonpaste.org/Paste/trunk/paste/httpserver.py) and > wsgiref's > (http://cvs.eby-sarna.com/wsgiref/src/wsgiref/simple_server.py?rev=1.2&view=markup) > > are all pretty short. It would be better to discuss the particulars. Is > there a code path in one or more of these servers which you think is > unneeded and problematic? A few points. 1. My opinion is not relevant to whether/which WSGI server goes into the standard library. What's required is for someone to propose to python-dev that a particular WSGI server should go into the standard library. I imagine that the response on python-dev to the proposer is going to be along the lines of "Will you be maintaining this?" If/when python-dev is happy, then it'll go into the distribution. 2. What's wrong with leaving the current situation as-is, i.e. the available WSGI implementations are listed on the WSGI Moin page http://wiki.python.org/moin/WSGIImplementations 3. If I had to pick one of the 3 you suggested, I'd pick the last one, i.e. PJE's, because it fulfills exactly the criteria I listed - It's pretty much the simplest possible implementation, meaning it's easiest to understand. - It's based on the existing *HttpServer hierarchy - It's got a big notice at the top saying """This is both an example of how WSGI can be implemented, and a basis for running simple web applications on a local machine, such as might be done when testing or debugging an application. It has not been reviewed for security issues, however, and we strongly recommend that you use a "real" web server for production use.""" Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Bowing out (was Re: A trivial template API counter-proposal)
[Alan Kennedy] >> Looking at this in an MVC context ... [Phillip J. Eby] > As soon as you start talking about what templates should or should not > do (as opposed to what they *already* do), you've stopped writing an > inclusive spec and have wandered off into evangelizing a particular > framework philosophy. Sorry if my message seemed unreasonable. My approach to such matters is to attempt to start from best design practice, keeping a keen focus on the best way to do things in the future, relegating poorly-architected legacy systems, e.g. active page systems, to being a secondary concern. Also, my take on active page systems is that they could easily be encompassed by an MVC model. The View is the active page, the Model is the namespace in which the active page is rendered and the Controller is the thing that does the rendering. [Phillip J. Eby] > At this point it has become clear to me that even if I spent my days > and nights writing a compelling spec of what I'm proposing and then > trying to sell it to the Web SIG, it would be at best a 50/50 chance > of getting through, and in the process it appears that I'd be burning > through every bit of goodwill I might have previously possessed here. > .. I'd rather save whatever karma I > have left here for something with a better chance of success. I'm sorry to hear that. [Phillip J. Eby] > Good luck with the spec. Well, I'm currently designing and implementing a View and ViewResolver in Spring for a customer, so I'll be keeping a note of requirements as I go, and will attempt to come up with a generic design which is suitable for a a templating standard. But it will be a few weeks before I can spec that, document it and start doing sample implementations which I can open source. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Graham Dumpleton] > Anyway, not that it matters, but the security fix was not the only thing > in those releases. Still, I think my point stands that internet-facing servers in the standard lilbrary are currently the only source of security advisories in python. http://www.python.org/security/ How sure are we that any proposed production WSGI server in the standard library will not become a source of further holes, especially if it tries to cover all the bases of a true production server, i.e. security, flexibility, efficiency, full http1.1 compliance, etc? Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Alan Kennedy] >>Instead, I think the right approach is to continue with the existing >>approach: put the most basic possible WSGI server in the standard >>library, for educational purposes only, and a warning that it shouldn't >>really be used for production purposes. [Bill Janssen] > I strongly disagree with this thinking. Non-production code shouldn't > go into the stdlib; instead, Alan's proposed module should go onto > some pedagogical website somewhere with appropriate tutorial > documentation. I still disagree ;-) IMO, the primary reason for not including production servers in the standard library is that servers need to be maintained much more fastidiously than the standard library, and need to be released on a timescale that is independent of python releases. Note the security hole incovered in the standard library xml-rpc lib last year. PSF-2005-001 - SimpleXMLRPCServer.py allows unrestricted traversal http://www.python.org/security/PSF-2005-001/ This particular security hole is the very reason why the Python Security response team had to be founded, and required point-releases of the entire python distribution to fix, i.e. python 2.3.5 and python 2.4.1 were released simply to fix this bug. There are two primary areas of the python distro that can result in such significant security holes. 1. Crypto libraries. Fortunately, the Timbot has been carefully watching over us, and ensuring the excellence of the python crypto libraries (as witnessed by the appearance of Ron Rivest on python-dev (!) last December: http://mail.python.org/pipermail/python-dev/2005-December/058850.html 2. Internet-exposed servers. No matter how careful developers are, it is very difficult to avoid designing security holes into such servers. Therefore, IMHO, it is wrong to include such servers into the standard distribution. Instead, production-ready servers should be independent of the standard distribution, have their own development teams, have independent release-cycles, etc, etc: think Twisted, mod_python, etc. So, I still think that only basic servers educational/playpen servers should go in the standard library, with an indication that the user should pick an openly server from outside the distro if they require to do serious server work. Maybe if there were no "production-ready" servers in the standard library, there would be no need for a "Python Security Response Team". Just my €0,02. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A trivial template API counter-proposal
[Phillip J. Eby] > Developing WSGI was not easy, either, as I'm sure you recall.You and I > certainly argued a bit about iterators and file-like objects and such, > and it took a while before I understood all of your use cases and we > were able to resolve them in the spec. If you had given up on > convincing me then, or if I had given up on your use cases as "too > complex", the spec would have suffered for it. And I am indeed most grateful that you took the time to understand my tiresome ramblings on the subject: WSGI is indeed a most excellent spec: well done! :-) [Alan Kennedy] >> I can understand why the web-sig has fallen into the trap of tying a >> tmeplating API to its nice web standard, WSGI: all web applications >> must generate output. But web apps need to generate a wide range of >> media types, e.g. image/*, application/msword, etc, etc, etc. [Phillip J. Eby] > And in many frameworks, it is the *template* that decides what media > type it is generating - and it may not even be outputting text or > unicode. Again, this is something that would be neglected by a > text-only spec. Ah, now there I have a problem! IMHO, templates should generate only a single media type. Whatever code is managing resource-delivery to the browser should decide which template to use, and set the media type accordingly, outside of the template. Let me explain in terms of an actual use case. I used to work for an e-learning company (widelearning.com), which delivered multimedia financial training materials. As much as possible, the content was delivered as video and Macromedia Flash, with fallback to simple image/* and text/html if the multimedia plugins were not available. This was done through two primary mechanisms: 1. Through plugin detection, i.e. running script in the browser to detect certain plugins, e.g. Flash. 2. Through user profiles, i.e. where the user selected their media preference, which was stored in a database. In both scenarios, entirely different templating engines were used. For text/*html, we used XSLT and JSP (ugh ;-) For Flash, we used a bespoke templating system, akin to Macromedia Generator (something like jgenerator: http://www.jzox.com). This was a templating engine that took a binary template as input, "cooked" the template with reference to a user data namespace, and generated a binary output stream representing a personalised Flash "movie". Neither rendering engine had any knowledge that other media types could potentially be returned to the user. Before any templates were rendered, a decision was made as to what media type was suitable to service the request, maximising the capabilities of the users browser, and the relevant rendering engine invoked, with the relevant template. This "separation of concerns" greatly simplified our development and QA process. IMO, permitting templates to select the media type is akin to the old problem of dealing with exceptions in various templating languages which intermingle code and presentation, e.g. JSP, ASP, PHP, etc. If a JSP caused an exception halfway through page-rendering, it was too late to do anything meaningful about it: the first half of the rendered page had already been transmitted to the user. What should really have happened is that the page should not have been transmitted to the user until the template was completely successfully rendered. That way, if an exception occurred, a suitable error page could be returned to the user, and the half-cooked template response discarded. Similarly, if a template is permitted to set HTTP headers, then it might discover too late that it is generating a media type that is unsuitable for the client. IMHO, some functionality in the HTTP application should decide the media type to be returned, call the relevant templating engine and set the relevant HTTP headers. Looking at this in an MVC context, the application is responsible for populating the Model (user namespace), and selecting which View (template<->media-type) is suitable for return to the user. Templates should not vary media types. HTTP headers do need to be set for different templates/media-types. But that should be the responsibility of the HTTP application, not the template, which should be unaware of the application contect in which it is running, except for the contents of the Model/user-namespace. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A trivial template API counter-proposal
[Guido van Rossum] > I see. But doesn't this still tie the templates API rather strongly to > a web API? What if I want to use a template engine to generate > spam^H^H^H^Hpersonalized emails? Or static HTML pages that are written > to the filesystem and then served by a classic Apache setup? Or source > code (program generators are fun toys!)? > > ISTM that a template that does anything besides producing the > *content* of a download would be a rather extreme exception -- e.g. > the decision to generate a redirect feels like *application* logic, > while templates ought to limit themselves to *presentation* logic, and > for that, an API that produces a string or writes to a stream seems to > make more sense. What am I missing? Are you proposing that I should > fake out the wsgi API and capture the strings passed to write() and > the sequence returned? At last! A voice of sanity! I've been dismayed over the last few days trying to follow the direction of this thread: it appears to me that something very simple has now become very complex. Templating is about taking a pattern of bytes (the template), somehow mixing it with user data (the user context), and generating a series of bytes (the output). Full stop. In relation to text, this means taking a textual template, with embedded code/processing-instructions/whatever, "cooking" it in a user namespace, delivering a final piece of output. With text, the only major concern is with character encoding. And if I were designing a templating API, I'd make everything unicode-only, leaving the user responsible for transcoding to their desired encoding at serialisation time. I can understand why the web-sig has fallen into the trap of tying a tmeplating API to its nice web standard, WSGI: all web applications must generate output. But web apps need to generate a wide range of media types, e.g. image/*, application/msword, etc, etc, etc. This topic started with Buffet, the de-facto standard templating API for CherryPy. Buffet is just about textual templating, which is a good thing. That's why it's very simple, and is thus actually being used. Perhaps web-sig is the wrong place to discuss a textual templating API. Maybe xml-sig would be a better place, or a new text-sig should be formed? In relation to Guido's point above about usage scenarios for this API: I'm quite interested because I have a jython implementation of ZPT/TAL that I'll be open-sourcing in the coming weeks, and which I intend to make compatible with whatever API is produced by this current discussion. I used that TAL implementation to generate the documentation for various things, usually just a flat set of HTML files in a directory: not a HTTP request in sight. Theoretically, I can envision a situation where I might want to swap TAL implementations for that offline generation process, meaning that it would be helpful to have a standardised API for controlling template cooking. Why should I have to use WSGI in that scenario? Just my €0,02. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI in standard library
[Peter Hunt] > I think CherryPy's WSGI server should go in: it's stable, and the > best-performing WSGI HTTP server out there. I disagree. I think that if a WSGI server is to go into the standard library, it should be the most basic one possible, e.g. one that builds on the *HttpServer.py hierarchy already there, and one that makes it as easy as possible for coders to understand how WSGI works. HTTP servers can be complex beasts. Security is a major consideration, robustness and stability being next. Performance is also a major concern, with flexibility and ease-of-use being important as well. That's too many concerns to balance against each other for a python library module. Instead, I think the right approach is to continue with the existing approach: put the most basic possible WSGI server in the standard library, for educational purposes only, and a warning that it shouldn't really be used for production purposes. The following quote is from the docstring of the CGIHTTPServer module """ In all cases, the implementation is intentionally naive -- all requests are executed sychronously. SECURITY WARNING: DON'T USE THIS CODE UNLESS YOU ARE INSIDE A FIREWALL -- it may execute arbitrary Python code or external programs. """ And that's a good thing. If I really want to use python CGI, then I should find a robust HTTP server which supports it, e.g. Apache. The same reasoning should apply to WSGI, IMHO. Just another €0,02. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Standardized template API
[Clark C. Evans] > I'd stick with the notion of a "template_name" that is neither the > template file nor the template body. Then you'd want a template factory > method that takes the name and produces the template body (complied if > necessary). I agree. If you're looking for an existing model (in java), the Spring framework has "View" objects (i.e. the V in MVC) and "View Resolver" objects. The latter resolve logical template names to actual templates, compiled if necessary. View Interface http://static.springframework.org/spring/docs/1.2.x/api/org/springframework/web/servlet/View.html ViewResovler Interface http://static.springframework.org/spring/docs/1.2.x/api/org/springframework/web/servlet/ViewResolver.html > This way your template could be stored > in-memory, on-disk, or in a database, or even remotely using an HTTP > cashe. The actual storage mechanism for the template source code should > not be part of this interface. A very important requirement IMHO. Regards, Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Communicating authenticated user information
[Alan Kennedy] >> I agree about not sending this information back to the user: it's >> unnecessary and potentially dangerous. [Phillip J. Eby] > Yep, it would be really dangerous to let me know who I just logged in to > an application as. I might find out who I really am! ;) Very droll ;-) What if other information, such as meta-information about the auth directory or database in which the credentials were looked up, was also communicated through X-headers, e.g. server connection details, etc. Happy for that to go back to the user too? If X-headers are to be used in WSGI, I think there should be something in the spec about whether or not they should be transmitted to the user. Alan. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Communicating authenticated user information
[Jim Fulton] >>>Is Zope the only WSGI application that performs authentication >>>itself? [Phillip J. Eby] >>I think Zope is the only WSGI application that cares about >> communicating this information back to the web server's logs. :) [Jim Fulton] > I hope that's not true. Certainly, if anyone else is doing > authentication in their applications or middleware, they > *should* care about getting information into the access logs. Well, Apache records auth info in logs as well, and it seems like a perfectly reasonable thing for a server to do . http://httpd.apache.org/docs/2.0/logs.html#accesslog [Phillip J. Eby] >> Perhaps an "X-Authenticated-User: foo" header could be added >> in a future spec version? (And as an optional feature in the >> current PEP.) [Jim Fulton] > Perhaps. Note that it should be clear that this is soley for use > in the access log. There should be no assumption that this is > a principal id or a login name. It is really just a label for the > log. To make this clearer, I'd use something like: > "X-Access-User-Label: foo". Sending X-headers seems hacky, and results in unnecessary information being transmitted back to the user (possibly revealing sensitive information, or opening security holes?) I think that the communication mechanism for auth information is possibly best served by a simple convention between auth middleware authors. Perhaps servers that are aware that auth middleware is in use can put a callable into the WSGI environment, which auth middleware calls when it has auth'ed the user? [Phillip J. Eby] > This seems a simpler way to incorporate the feature than adding > an extension API to environ. [Jim Fulton] > Why is that? Isn't the env meant for communication between > the WSGI layers? I'm not sure I'd want to send this information > back to the browser. I think an API could be very simple, and optional for servers that know they won't be logging auth information. I agree about not sending this information back to the user: it's unnecessary and potentially dangerous. Regards, Alan Kennedy. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com