Re: [Web-SIG] ANN: General availability of the WebCore WSGI nanoframework v2.0.
On 2016-04-19 12:56:26 +, Ian Cordasco said: * Annotated Source Documentation: http://s.webcore.io/fjVc (pythonhosted docs, also linked on the pypi page) For what it's worth, the PyPI/Warehouse/PyPA developers are planning on deprecating pythonhosted for PyPI packages. The suggestion is that you use something like ReadTheDocs.org for documentation hosting. Indeed, I tried desperately to not use it, but WebCore 1 documentation was formerly there and after two hours of searching prior to release was unable to find any way to _remove_ the already uploaded documentation and remove the reference on the pypi page. A very sub-optimal packaging experience, there, so the sooner it's actually gone the happier I'll be. (It's quite slow, for example. ;) -- Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] OT: dotted names
On 2011-04-15 22:33:08 +, P.J. Eby said: At 04:11 PM 4/15/2011 -0400, Fred Drake wrote: These end users don't really care if the object identified is a class or function in module, a nested attribute on a class, or anything else, so long as it does what it's advertised to do. By not pushing implementation details into the identifier, the package maintainer is free to change the implementation in more ways, without creating backward incompatibility. That would be one advantage of using entry points instead. ;-) (i.e., the user doesn't specify the object location, the package author does.) Note, however, that one must perform considerably more work to resolve a name, when you don't know whether each part of the name is a module or an attribute. Not if, as you mention, you use an explicit format. The format my resolver code uses (and this code is utilized in marrow.mailer for manager/transport lookup, marrow.server.http's command-line script to resolve WSGI applications, and marrow.templating to resolve templates) covers the following: :: object :: entrypoint_name :: ../relative/path/to/something :: ./relative/path/to/something :: /absolute/path/to/something :: package.relative/path/to/something :: package.absolute.path :: package.submodule:object :: package.submodule:object.attribute What is allowed on any given resolution depends on if the resolver request is looking for an on-disk path or object. Using the above as an example, you can define the use of the SMTP transport within marrow.mailer in two ways: from marrow.mailer.transport.smtp import SMTPTransport config = dict(transport=SMTPTransport) # direct reference config = dict(transport=smtp) # entry point config = dict( # object lookup transport = marrow.mailer.transport.smtp:SMTPTransport ) When configuring m.s.http to load an app, you can: # p-code HTTPServer.serve(project.application:WSGIApp.factory) When choosing templates, OTOH, you can do the following: return ./templates/foo.html, dict() return /var/www/foo.html, dict() return myapp.templates.foo, dict() return myapp/templates/foo.html, dict() return myapp.stemplates:email.welcome, dict() Either you have to get an AttributeError first, and then fall back to importing, or get an ImportError first, and fall back to getattr. If you examine the above closely, the differing formats are easily identifiable using a few == and 'in' conditionals: if not isinstance(ref, basestring): return ref if ref[0] == '.': pass # relative if ref[0] == '/': pass # absolute if '/' not in ref and '.' not in ref and ':' not in ref: pass # entrypoint if ':' in ref: import_, _, attrs = ref.partition(':') base = __import__(import_) for attr in attrs.split('.'): base = getattr(base, attr) return attr if '/' in ref: import_, _, path = ref.partition('/') pass # use pkg_resources + path to pull file from package If the syntax is explicit, OTOH, then you don't have to guess, thereby saving lots of work and wasteful exceptions. :) — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-18 14:11:21 -0700, Daniel Holth said: The file format discussion seems utterly pointless. That's a pity. If you want the format to specify cron jobs and services and non-wsgi servers, why not go the whole way and use the Linux filesystem hierarchy standard. The entry point is an executable called `init`, configuration goes in /etc/, cron jobs go in /etc/cron.d etc. This should be flexible enough. Because that would be… less than good. Let me illustrate: a) The LFS is intended for complete operating system installations. b) You sure as hell wouldn't want the init process to be Python. c) Operating-system specific features are a no-go for portability. d) We don't want developers to have to suddenly become sysadmins, too. e) /etc is terrible for configuration organization. There are other, lower-level reasons not to do that. One big point is that the application server / container writes a single configuration file which is then read in by the application. One file, not a tree of them. I hope most applications won't need to look at the contents of app.yaml (the application container config) at all. No-one has said that an application /would/ have to look at the application metadata, or that after installation the file was anywhere app accessible, even. Paste Deploy configures logging by passing the .ini to logging before invoking the app's entry point. This is the application container configuring the logging. I've already defined that. RTFM or many ML messages about logging. For example a cool application container feature would be to have a little web application that manipulated logging configuration in a database, or reconfigured logging between requests without restarting the application. The former is already defined. That's what the application server does, database or no. The latter is broadly unnecessary, but easily implementable within the application you are deploying. One way to pass 'services' information would be to specify a support package with abstract base classes and have a procedure for proposing new standard services to the web-sig. The container would have to populate a registry of named implementations of those services it is able to support: That seems… excessive and ugly. You would also have code mixing between the application server level and application level which will encourage nothing but madness. Simple, named services with optional configurations are more than enough. I would really like to see a basic specification with no support for services or 'spending an hour running apt-get to reconfigure the server before eventually getting around to running the application', and a procedure for extending the format. apt-get has already been thrown out, and was, in fact, never part of the quick summary I made, either. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-18 16:36:28 -0700, Daniel Holth said: On Apr 18, 2011, at 6:09 PM, Alice Bevan–McGregor wrote: I've already defined that. RTFM or many ML messages about logging. Please remain friendly and patient. That depends on how you define the F in RTFM. In this instance, I meant read the fine manual. ;) You can understand my frustration, however, that 10% of the posts in this thread demonstrate a lack of understanding of (or lack of even a cursory glance at) a) my initial post and associated document, and b) the rest of the mailing list posts. Asking for things already agreed upon or questions already resolved wastes everyone's time. On 2011-04-18 16:46:12 -0700, Eric Larson said: Instead of assuming /etc always means the root of the filesystem we should consider it the root of the sandbox where the system providing the sandbox defines what that is. While /etc certainly wouldn't be the root of anything (insert sarcastic smiley here ;), it was already agreed upon that / would refer to the application container root, not system root. I share Ian's sentiment, see: (search for 'root' on that page) http://mail.python.org/pipermail/web-sig/2011-April/005041.html It is _a_ filesystem in that there is a place that an application will be run. For argument's sake, we'll say it is a directory on some server. Now, within that directory we choose to take some known bits from the LFS standard such as /etc, /bin, /var, etc for the placement of our application. Again, not such a great idea. With that in mind, I think using things like LFS makes a ton of sense. We can piggy back or copy (since previous discussions for .debs or rpms seem not to sit well... even though they would fit this model very well...) systems like RPM rather directly and hopefully allow our Python web apps to play very nicely with applications in other languages. I can't fully grok this paragraph. FHS (my bad calling it LFS earlier!) = good because we won't confuse systems administrators and it matches other binary packaging models? I doubt an isolated web application will have a need for more than 6% (3) of these: http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard#Directory_structure While I personally have a FHS-like application deployment model using Git, I would rather not see that level of complexity as a requirement for deploying basic applications. Please do not get hung up on the fact that I've said RPMs here. The fact is distros have been doing package management for quite a long while. It is insanely convenient to say apt-get install couchdb and when it is done, having a couchdb server running. It may be convienent, but it's also quite the risk. You're letting someone else configure your server. Also, do binary installation systems automatically start the service post-installation before you can configure them? I have difficulty believing that, which means a whole whack-ton of effort under a systems administrator hat has been glossed over. Copying the model seems like a good option in that we get to learn from the mistakes of others while inheriting a wild variety of tools and concepts. The on-disk structure which the application lives within (the application container) is up to the application server in use. The underlying application should, and, IMHO, -must- be agnostic to it. Passing paths to configuration files, TMPDIR, etc. in the environment is a fairly trivial way to do that, at which point the FHS discussion is nearly moot. If you want a complete (complete enough for a simple web application) FHS structure within the redistributable, I don't see the point of having that many empty directories. ;) As an aside, I -do- have an application in production using a FHS-like file structure: https://gist.github.com/926617 But again, I'm not suggesting something like that for the redistributable application! — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] OT: dotted names (Was: Re: A Python Web Application Package and Format)
On 2011-04-15 11:02:17 -0700, Jim Fulton said: On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo mer...@netwok.org wrote: As an aside, I wonder why people use dot+colon notation instead of just dots to reference callables. In distutils2 for example we resolve dotted names to find command classes, command hooks and compilers. So what’s the benefit, marginally easier parsing? An opportunity of using a colon is that it allows:: dotted.module.name:expression where expression may be more than just a name:: foo.bar:Bar() Or foo.bar:Baz.factory. I wouldn't go so far as to eval() what's after the colon. The real difference is this: [foo.bar]:[Baz.factory] | ^- Attribute lookup. ^- Module lookup. You can't do this: import foo.bar.Baz.factory Thus the difference. However, the syntax is actually more flexible than that: [foo.bar]/[subfolder/file] | ^- Sub-path. ^- Module. /[foo/bar] ^- Just path. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-14 10:34:59 -0700, Ian Bicking said: I think there's a general concept we should have, which I'll call a script -- but basically it's a script to run (__main__-style), a callable to call (module:name), or a URL to fetch internally. Agreed. The reference notation I mentioned in my reply to Graham, with the addition of URI syntax, covers all of those options. I want to keep this distinct from anything long-running, which is a much more complex deal. The primary application is only potentially long-running. (You could, in theory, deploy an app as CGI, but that way lies madness.) However, the reference syntax mentioned (excepting URL) works well for identifying this. I think given the three options, and for general simplicity, the script can be successful or have an error (for Python code: exception or no; for __main__: zero exit code or no; for a URL: 2xx code or no), and can return some text (which may only be informational, not structured?) For the simple cases (script / callable), it's pretty easy to trap STDOUT and STDERR, deliver INFO log messages to STDOUT, everything else to STDERR, then display that to the administrator in some form. Same for HTTP, except that it can include full HTML formatting information. An application configuration could refer to scripts under different names, to be invoked at different stages. A la the already mentioned post-install, pre-upgrade, post-upgrade, pre-removal, and cron-like. Any others? There could be an optional self-test script, where the application could do a last self-check -- import whatever it wanted, check db settings, etc. Of course we'd want to know what it needed *before* the self-check to try to provide it, but double-checking is of course good too. Unit and functional tests are the most obvious. In which case we'll need to be able to provide a localhost-only 'mounted' location for the application even though it hasn't been installed yet. One advantage to a separate script instead of just one script-on-install is that you can more easily indicate *why* the installation failed. For instance, script-on-install might fail because it can't create the database tables it needs, which is a different kind of error than a library not being installed, or being fundamentally incompatible with the container it is in. In some sense maybe that's because we aren't proposing a rich error system -- but realistically a lot of these errors will be TypeError, ImportError, etc., and trying to normalize those errors to some richer meaning is unlikely to be done effectively (especially since error cases are hard to test, since they are the things you weren't expecting). Humans are potentially better at reading tracebacks than machines are, so my previous logging idea (script output stored and displayed to the administrator in a readable form) combined with a modicum of reasonable exception handling within the script should lead to fairly clear errors. Categorizing services seems unnecessary. The description of the different database options were for illustration, not actual separation and categorization. I'd like to see maybe an | operator, and a distinction between required and optional services. E.g.: No need for some new operator, YAML already supports lists. services: - [mysql, postgresql, dburl] Or: services: required: - files optional: - [mysql, postgresql] And then there's a lot more you could do... which one do you prefer, for instance. The order of services within one of these lists would indicate preference, thus MySQL is preferred over PostgreSQL in the second example, above. Tricky things: - You need something funny like multiple databases. This is very service-specific anyway, and there might sometimes need to be a way to configure the service. It's also a fairly obscure need. I'm not convinced that connecting to a legacy database /and/ current database is that obscure. It's also not as hard as Django makes it look (with a 1M SLoC change to add support)… WebCore added support in three lines. - You need multiple applications to share data. This is hard, not sure how to handle it. Maybe punt for now. That's what higher-level APIs are for. ;) You mean, the application provides its own HTTP server? I certainly wouldn't expect that...? Nor would I; running an HTTP server would be daft. Running mod_wsgi, FastCGI on-disk sockets, or other persistent connector makes far more sense, and is what I plan. Unless you have a very, very specific need (i.e. Tornado), running a Python HTTP server in production then HTTP proxying to it is inefficient and a terrible idea. (Easy deployment model, terrible overhead/performance.) Anyway, in terms of aggregate, I mean something like a site that is made up of many applications, and maybe those applications are interdependent in some fashion.
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-13 18:16:36 -0700, Ian Bicking said: While initially reluctant to use zip files, after further discussion and thought they seem fine to me, so long as any tool that takes a zip file can also take a directory. The reverse might not be true -- for instance, I'd like a way to install or update a library for (and inside) an application, but I doubt I would make pip rewrite zip files to do this ;) But it could certainly work on directories. Supporting both isn't a big deal except that you can't do symlinks in a zip file. I'm not talking about using zip files as per eggs, where the code is maintained within the zip file during execution. It is merely a packaging format with the software itself extracted from the zip during installation / upgrade. A transitory container format. (Folders in the end.) Symlinks are an OS-specific feature, so those are out as a core requirement. ;) I don't think we're talking about something like a buildout recipe. Well, Eric kind of brought something like that up... but otherwise I think the consensus is in that direction. Ambiguous statements FTW, but I think I know what you meant. ;) So specifically if you need something like lxml the application specifies that somehow, but doesn't specify *how* that library is acquired. There is some disagreement on whether this is generally true, or only true for libraries that are not portable. +1 I think something along the lines of autoconf (those lovely ./configure scripts you run when building GNU-style software from source) with published base 'checkers' (predicates as I referred to them previously) would be great. A clear way for an application to declare a dependency, have the application server check those dependencies, then notify the administrator installing the package. I've seen several Python libraries that include the C library code that they expose; while not so terribly efficient (i.e. you can't install the C library once, then share it amongst venvs), it is effective for small packages. Larger (i.e. global or application-local) would require the intervention of a systems administrator. Something like a database takes this a bit further. We haven't really discussed it, but I think this is where it gets interesting. Silver Lining has one model for this. The general rule in Silver Lining is that you can't have anything with persistence without asking for it as a service, including an area to write files (except temporary files?) +1 Databases are slightly more difficult; an application could ask for: :: (Very Generic) A PEP-249 database connection. :: (Generic) A relational database connection string. :: (Specific) A connection string to a specific vendor of database. :: (Odd) A NoSQL database connection string. I've been making heavy use of MongoDB over the last year and a half, but AFIK each NoSQL database engine does its own thing API-wise. (Then there are ORMs on top of that, but passing a connection string like mysql://user:pass@host/db or mongo://host/db is pretty universal.) It is my intention to write an application server that is capable of creating and securing databases on-the-fly. This would require fairly high-level privileges in the database engine, but would result in far more plug-and-play configuration. Obviously when deleting an application you will have the opportunity to delete the database and associated user. I assume everyone agrees that an application can't write to its own files (but of course it could execfile something in another location). +1; that _almost_ goes without saying. :) At the same time, an application server /must not/ require root access to do its work, thus no mandating of (real) chroots, on-the-fly user creation, etc. There are ways around almost all security policies, but where possible setting the read-only flag (Windows) or removing write (chmod -w on POSIX systems) should be enough to prevent casual abuse. I suspect there's some disagreement about how the Python environment gets setup, specifically sys.path and any other application-specific customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in silvercustomize.py, and find it helpful). Similar to Paste's here variable for INI files, having some method of the application defining environment variables with base path references would be needed. I've tossed out my idea of sharing dependencies, BTW, so a simple extraction of the zipped application into one package folder (linked in using a .pth file) with the dependencies installed into an app-packages folder in the path (like site-packages) would be ideal. At least, for me. ;) Describing the scope of this, it seems kind of boring. In, for example, App Engine you do all your setup in your runner -- I find this deeply annoying because it makes the runner the only entry point, and thus makes testing, scripts, etc. hard. I agree; that's a
Re: [Web-SIG] A Python Web Application Package and Format
Howdy! I suspect you're thinking a little too low-level. On 2011-04-14 00:53:09 -0700, Graham Dumpleton said: On 14 April 2011 16:57, Alice Bevan–McGregor al...@gothcandy.com wrote: 3. Define how to get the WSGI app. This is WSGI specific, but (1) is *not* WSGI specific (it's only Python specific, and would apply well to other platforms) I could imagine there would be multiple application types: :: WSGI application. Define a package dot-notation entry point to a WSGI application factory. Why can't it be a path to a WSGI script file? No reason it couldn't be. app.type = wsgi app.target = /myapp.wsgi:application (Paths relative to the folder the application is installed into, and dots after a slash are filename parts, not module separators.) But then, how do you configure it? Using a factory (which is passed the from-appserver configuration) makes a lot of sense. This actually works more universally as it works for servers which map URLs to file based resources as well. First, .wsgi files (after a few quick Google searches) are only used by mod_wsgi. I wouldn't call that universal, unless you can point out the other major web servers that support that format. You'll have to describe the map URLs to file based resources issue, since every web server I've ever encountered (Apache, Nginx, Lighttpd, etc.) works that way. Only if someone is willing to get really hokey with the system described thus far would any application-scope web servers be running. Also allows alternate extensions than .py and also allows basename of file name to be arbitrarily named, both of which help with those same servers which map URLs to file base resources. Again, you'll have to elaborate or at least point to some existing documentation on this. I've never encountered a problem with that, nor do any of my scripts end in .py. It also allows same name WSGI script file to exist in multiple locations managed by same server without having to create an overarching package structure with __init__.py files everywhere. Packages aren't a bad thing. In fact, as described so far, a top level package is required. For WSGI servers which currently require a dotted path, eg gunicorn: See my note above; choice of Python-level HTTP interface is not up to the application, though by all means there should be some simple way to launch a development server. The WSGI script file then can itself even be responsible for further setup of sys.path as appropriate and so be more self contained and not dependent on an external launch system. The -point- (AFIK/IMHO) is to be dependent on an external launch system. and in the end of myapp.py add bolier plate like: from wsgiref.simple_server import make_server httpd = make_server('', 8000, application) print Serving on port 8000... httpd.serve_forever() Again, I've never described anything that would require that nonsense. WSGI callable, preferably a factory callable, that's it. Use a different server which required such boilerplate and you had to change it. Not the problem of the application. Using a WSGI script file as the lowest common denominator, it would also be nice to be able to do something like: python -m gunicorn.server myapp.wsgi python -m wsgiref.server myapp.wsgi Not a half bad idea, but again, no reason to restrict it to .wsgi files. (That's also a completely different problem then an applicaiton format currently under discussion.) I've written and rewritten my dot-colon-notation system enough that it supports: :: /path[/sub[...]][:object[.property]] (even if it has to execfile it) :: package[.module[...]][/folder[...]][:object[.property]] I think that syntax pretty much covers everything, including .wsgi files (/path/to/foo.wsgi:application). The implementation of the above is fully unit tested, and I really don't mind people stealing it. ;) — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-14 08:53:55 -0700, Randy Syring said: Just wondering if Windows/IIS is being kept in mind as this discussion is going on. I am having a hard time conceptualizing the things being discussed, so can't really tell myself. I'm trying pretty hard to ensure that non-compatible OS features don't make it in here. Things like symlinks, chroots, etc. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-11 00:53:02 -0700, Eric Larson said: Hi, On Apr 10, 2011, at 10:29 PM, Alice Bevan–McGregor wrote: However, the package format I describe in that gist does include the source for the dependencies as snapshotted during bundling. If your application is working in development, after snapshotting it /will/ work on sandbox or production deployments. I wanted to chime in on this one aspect b/c I think the concept is somewhat flawed. If your application is working in development and snapshot the dependencies that is no guarantee that things will work in production. The only way to say that snapshot or bundle is guaranteed to work is if you snapshot the entire system and make it available as a production system. `pwaf bundle` bundles the source tarballs, effectively, of your application and dependencies into a single file. Not unlike a certain feature of pip. And… wait, am I the only one who uses built-from-snapshot virtual servers for sandbox and production deployment? I can't be the only one who likes things to work as expected. Using a real world example, say you develop your application on OS X and you deploy on Ubuntu 8.04 LTS. Right away you are dealing with two different operating systems with entirely different system calls. If you use something like lxml and simplejson, you have no choice but to repackage or install from source on the production server. Installing from source is what I was suggesting. Also, Ubuntu on a server? All your `linux single` (root) are belong to me. ;^P While it is fair to say that generally you could avoid packages that don't use C, both lxml and simplejson are rather obvious choices for web development. Except that json is built-in in 2.6 (admittedly with fewer features, but I've never needed the extras) and there are alternate xml parsers, too. It sounds like Ian doesn't want to have any build steps which I think is a bad mantra. A build step lets you prepare things for deployment. A deployment package is different than a development package and mixing the two by forcing builds on the server or seems like asking for trouble. I'm having difficulty following this statement: build steps good, building on server bad? So I take it you know the exact target architecture and have cross-compilers installed in your development environment? That's not practical (or simple) at all! I'm not saying this is what you (Alice) are suggesting, but rather pointing out that as a model, depending on virtualenv + pip's bundling capabilities seems slightly flawed. Virtualenv (or something utilizing a similar Python path 'chrooting' capability) and pip using the extracted deps as the source for offline installation actually seems quite reasonable to me. The benefit of a known set of working packages (i.e. specific version numbers, tested in development) and the ability to compile C extensions in-place. (Because sure as hell you can't reliably compile them before-hand if they have any form of system library dependency!) I think it should offer hooks for running tests, learning basic status and allow simple configuration for typical sysadmin needs (logging via syslog, process management, nagios checks, etc.). Instead of focusing on what format that should take in terms of packages, it seems more effective to spend time defining a standard means of managing WSGI apps and piggyback or plain old copy some format like RPMs or dpkg. RPMs are terrible, dpkg is terrible. Binary package distribution, in general, is terrible. I got the distinct impression at PyCon that binary distributable .eggs were thought of as terrible and should be phased out. Also, nobody so far seems to have noticed the centralized logging management or deamon management lines from my notes. Just my .02. Again, I haven't offered code, so feel free to ignore me. But I do hope that if there are others that suspect this model of putting source on the server is a problem pipe up. If I were to add a requirement it would be that Python web applications help system administrators become more effective. That means finding consistent ways of deploying apps that plays well with other languages / platforms. After all, keeping a C compiler on a public server is rarely a good idea. If you could demonstrate a fool-proof way to install packages with system library dependencies using cross-compilation from a remote machine, I'm all ears. ;) — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-11 13:49:20 -0700, Alex Grönholm said: I use Ubuntu on all my servers, and linux single does not work with it, I can tell you ;P The number of poorly configured Ubuntu servers I have seen (and replaced) is staggering. Any time the barrier to entry is lowered, quality suffers: having a compiler on the server is nothing compared to having a complete X graphical environment running as root, with root and a single user sharing the same password. ;^D — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
Howdy! On 2011-04-11 15:22:11 -0700, Ian Bicking said: I... think we are misunderstanding each other or something. Something. ;) A nice tool that could use this format, for instance, would be a tool that takes an app and creates a puppet recipe to setup a sever to host the application. A different tool (maybe better, maybe not?) would be a puppet plugin (if that's the terminology) that uses this format to tell puppet about all the requirements an application has, perhaps translating some notions to puppet-native concepts, or adding high-level recipes that setup an appropriate container (which can be as simple as a properly configured Nginx or Apache server). Minuteman (loved the hat from the PyCon lightning talk), buildout, puppet, make, bash, custom XML-RPC APIs, … there are quite a number of ways to push something into production. Standardizing on one would marginalize the idea, and being agnostic means there is a whole /lot/ of work to be done to add support to every tool. :/ What I mean when I say there's a danger of becoming a configuration management tool, is that if you include hooks for the application to configure its environment you are probably stepping on the toes of whatever other tool you might use. And once you start down that path things tend to cascade. Have a gander at the Application Spec section; what, specifically, are you at odds with as coming from the application? I work with specifics, not vague don't do that! comments. The configuration of environment extends to: :: static resource declaration, because a tool that manages server configuration can do a better job 'mounting' those resources. :: services (in your parlance, 'resources' in mine) such as give me an sql database. :: recurrent tasks (a la cron) because having that centralized across multiple applications Isn't Just a Good Idea™ -- treat this as a 'service' if you must. If you include something in the packaging format that indicates the libraries to be installed, then you are encouraging and perhaps requiring that the server install libraries during a deployment. Libraries that are __bundled with the application__. I fail to see the 'badness' of this, or, really, how this differs from Silver Lining. I'd double-check this, but cloudsilverlining.org is inaccessible from my current location for some reason. :/ Realistically this can't be entirely avoided, but I think it is a pretty workable separation to declare only those dependencies that can't reasonably be included directly in the application itself (e.g., lxml, MySQLdb, git, and so on). In Silver Lining those dependencies were expressed as Debian package names, installed via dpkg, but for a more general system it would need to be somewhat more abstract. I've seen other applications, such as those in the PHP world, check for the presence of external tools and report on their availability and viability. Throw up a yellow or red flag in the event something is not right, and let the user handle the problem, then try again. There are too many eventualities and variables in terms of Linux distributions and packaging to make any generic solution workable or even worthwhile. At least, until we have high-order AI replacing sysadmins. OK; then #4 is is the only thing I would choose to support, as it is the most general and easiest for tools to support, and least likely to lead to different behavior with different tools. And not to just defer to authority, but having written a half dozen tools in this area, not all of them successful, I feel strongly that including dependencies is best -- simplest for both producer and consumer, and most reliable. Thank you for reading what I wrote. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
pre-install-hooks: [ apt-get install libxml2, # the person deploying the package assumes apt-get is available run-some-shell-script.sh, # the shell script might do the following on a list of URLs wget http://mydomain.com/canonical/repo/dependency.tar.gz tar zxf dependency.tar.gz rm dependency.tar.gz ] Does that make some sense? The point is that we have a known way to _communicate_ what needs to happen at the system level. I agree that there isn't a fool proof way. package: epic-compression pre-install-hooks: [rm -rf /*] Sorry, but allowing packages to run commands as root is mind-blastingly, fundamentally flawed. You mention an inability to roll back or upgrade? The above would be worse in that department. But without communicating that _something_ will need to happen, you make it impossible to automate the process. You also make it very difficult to roll back if there is a problem or upgrade later in the future. Really, in what way? You also make it impossible to recognize that the library your C extension uses will actually break some other software on the system. LD_PATH. Sure you could use virtual machines, but if we don't want to tie ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox, Xen or any of the other hypervisors and cloud vendors? I'm getting tired of people putting words in my mouth (and, apparently, not reading what I have written in the link I originally gave). Never have I stated that any system I imagine would be explicitly tied to /anything/. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-11 16:13:06 -0700, Ian Bicking said: (I'm confused; I just noticed there's a web-sig@python.org and python-web-...@googlegroups.com?) I only see one actual gmane group, gmane.comp.python.web... — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
Eric, Let me rephrase a few things. On 2011-04-11 17:48:14 -0700, Eric Larson said: pre-install-hooks: [ apt-get install libxml2, # the person deploying the package assumes apt-get is available Assumptions are evil. You could end up with multiple third-party applications each assuming different things. Aptitude, apt-get, brew, emerge, ports, … run-some-shell-script.sh, # the shell script might do the following on a list of URLs There is zero way of tracking what that does, so out of the gate that's a no-no, and full system chroots (not what I'm talking about in terms of chroot) require far too much organization/duplication/management. The 'hooks' idea listed in my original document is for callbacks into the application. That callback would be one of: :: A Python script to execute. (path notation) :: A Python callable to execute. (dot-colon notation) :: A URL within the application to GET. (url notation) Arbitrary system-level commands are right out: Linux, UNIX, BSD, Windows, Solaris… good luck getting even simple commands to execute identically and predictably across platforms. The goal isn't to rewrite buildout! Just b/c a command like apt-get is used it doesn't mean it is used as root. The point is not that you can install things via the package, but rather that you provide the system a way to install things as needed that the system can control. A methodology of testing for the presence and capability of specific services (resources) is far more useful than rewriting buildout. I need an SQL database of some kind. I need this C library within these version boundaries. Etc. Those are reasonable predicates for installation. You can combine this application format with buildout, puppet, or brew-likes if you want to, though. Personally, I'd rather not re-invent the wheel of a Linux distribution, thanks. I wouldn't even want an application server to touch system-wide configurations other than web server configurations for the applications hosted therein. If you start telling the system what is supported then as a spec you have to support too many actions: pre-install-hooks: [ ('install', ['libxml2', 'libxslt']), ('download', 'foo-library.tar.gz'), ('extract', 'foo-library.tar.gz'), ... # the idea being ($action, $args) ] I define no actions, only a callback. This is a pain in the neck as a protocol. Unfortunately for your argument this is a protocol you invented, not one that I defined. It is much simpler to have a list of pre-install-hooks and let the hosting system that is installing the package deal with those. If your system wants to run commands, you have the ability to do so. If you want to list package names that you install, go for it. If you have a tool that you want to use that the package can provide arguments, that is fine too. From the standpoint of a spec / API / package format, you don't really control the tool that acts on the package. Bing. You finally understand what I defined. This is the same problem that setuptools has. There isn't a record of what was installed. That's a tool-level problem unrelated to application packaging. For a good example of a Python application that /does/ manage packages, file tracking, etc. have a look at Gentoo's Portage system. It is safe to assume a deployed server has some software installed (nginx, postgres, wget, vim, etc.) and those requirements should usually be defined by some system administrator. No application honestly cares what front-end web server it is running on unless it makes extensive use of very specific plugins (like Nginx's push notification service). Again, most of this is outside the scope of an application container format. Do your applications honestly need access to vim? Also, assume nothing. When an application requires that you install some library, it is helpful to that sysadmin because that person has some options when something is meant to be deployed: 1. If the library is incompatible and will break some other piece of software, you can know and stop the deployment right there That's what the sandbox is for. I've been running Gentoo servers with 'slotting' mechanisms for 10 years, now, and having multiple installed libraries that are incompatible with one-another is not unusual, unheard of, or difficult. (Three versions of PHP, three of Python, etc.) 2. If the application is going to be moved to another server, the sysadmin can go ahead and add that app's requirements to their own config (puppet class for example) Puppet, buildout, etc. is, again, outside the scope. And if the application already defines requirements, what config file are you updating and duplicating the data needlessly within? 3. If two applications are running on the same machine, they may have inconsistent library requirements That's what the sandbox is for. 4. If an application does fail
Re: [Web-SIG] A Python Web Application Package and Format
On 2011-04-10 16:25:21 -0700, James Mills said: +1 too. I would however like to see this idea developed in a generic and useable way. ie: No zope/twisted deps or making it fit around Django :) Ideally it should be useable by the most basic (plain old WSGI). The following are the collected ideas of myself and a few other users in the WebCore chat room: https://gist.github.com/911991 Being generic (i.e. using WSGI under-the-hood) and allowing generic port assignments for other (non-web) networked applications is a design goal. The aversion to packaged zips is not entirely understandable to us; in this case, a packaged copy of the application is produced via a setup.py command, though in theory one could develop with that model and just zip everything up in the end by hand. Silver Lining seems to require too much in the way of hacking (modifying .pth files, etc) to be reasonable. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] A Python Web Application Package and Format
Howdy! On 2011-04-10 19:06:52 -0700, Ian Bicking said: There's a significant danger that you'll be creating a configuration management tool at that point, not simply a web application description. Unless you have the tooling to manage the applications, there's no point having a standard for them. Part of that tooling will be some form of configuration management allowing you to determine the requirements and configuration of an application /prior/ to installation. Better to have an application rejected up-front (Hey, this needs my social insurance number? Hells no!) then after it's already been extracted and potentially littered the landscape with its children. The escape valve in Silver Lining for these sort of things is services, which can kind of implement anything, and presumably ad hoc services could be allowed for. Generic services are useful, but not useful enough. You create a build process as part of the deployment (and development and everything else), which I think is a bad idea. Please elaborate. There is no requirement for you to use the application packaging format and associated tools (such as an application server) during development. In fact, like 2to3, that type of process would only slow things down to the point of uselessness. That's not what I'm suggesting at all. My model does not use setup.py as the basis for the process (you could build a tool that uses setup.py, but it would be more a development methodology than a part of the packaging). I know. And the end result is you may have to massage .pth files yourself. If a tool requires you to, at any point during normal operation, hand modify internal files… that tool has failed at its job. One does not go mucking about in your Git repo's .git/ folder, as an example. How do you build a release and upload it to PyPi? Upload docs to packages.python.org? setup.py commands. It's a convienent hook with access to metadata in a convienent way that would make an excellent let's make a release! type of command. Also lots of libraries don't work when zipped, and an application is typically an aggregate of many libraries, so zipping everything just adds a step that probably has to be undone later. Of course it has to be un-done later. I had thought I had made that quite clear in the gist. (Core Operation, point 1, possibly others.) If a deploy process uses zip file that's fine, but adding zipping to deployment processes that don't care for zip files is needless overhead. A directory of files is the most general case. It's also something a developer can manipulate, so you don't get a mismatch between developers of applications and people deploying applications -- they can use the exact same system and format. So, how do you push the updated application around? Using a full directory tree leaves you with Rsync and SFTP, possibly various SCM methods, but then you'd need a distinct repo (or rootless branch) just for releasing and you've already mentioned your dislike for SCM-based deployment models. Zip files are universal -- to the point that most modern operating systems treat zip files /as folders/. If you have to, consider it a transport encoding. The pattern that it implements is fairly simple, and in several models you have to lay things out somewhat manually. I think some more convention and tool support (e.g., in pip) would be helpful. +1 Though there are quite a few details, the result is more reliable, stable, and easier to audit than anything based on a build process (which any use of dependencies would require -- there are *no* dependencies in a Silver Lining package, only the files that are *part* of the package). It might be just me (and the other people who seem to enjoy WebCore and Marrow) but it is fully possible to do install-time dependencies in such a way as things won't break accidentally. Also, you missed Application Spec #4. Some notes from your link: - There seems to be both the description of a format, and a program based on that format, but it's not entirely clear where the boundary is. I think it's useful to think in terms of a format and a reference implementation of particular tools that use that format (development management tools, like installing into the format; deployment tools; testing tools; local serving tools; etc). Indeed; this gist was some really quickly hacked together ideas. - In Silver Lining I felt no need at all for shared libraries. Some disk space can be saved with clever management (hard links), but only when it's entirely clear that it's just an optimization. Adding a concept like server-packages adds a lot of operational complexity and room for bugs without any real advantages. ±0 - I try to avoid error conditions in the deployment, which is a big part of not having any build process involved, as build processes are a source of constant errors -- you can do a stage deployment,
Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)
On 2011-01-10 13:12:57 -0800, Guido van Rossum said: Ok, now that we've had a week of back and forth about this, let me repeat my threat. Unless more concerns are brought up in the next 24 hours, can PEP be accepted? It seems a lot of people are waiting for a decision that enables implementers to go ahead and claim PEP 333[3] compatibility. PEP 444 can take longer. With the lack of responses, can I assume this has been or will be shortly marked as accepted? I look forward to updating WebCore with compatibility. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)
On 2011-01-10 13:12:57 -0800, Guido van Rossum said: Ok, now that we've had a week of back and forth about this, let me repeat my threat. Unless more concerns are brought up in the next 24 hours, can PEP be accepted? It seems a lot of people are waiting for a decision that enables implementers to go ahead and claim PEP 333[3] compatibility. PEP 444 can take longer. Two hours to go... - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] Generator-Based Applications: Marrow HTTPd Example
Howdy! Here's a rewritten (and incomplete, but GET and HEAD requests work fine) marrow.server.http branch [1] that illustrates a simple application [2] and protocol implementation [3]. Most notably, examine the 'resume' method [4]. The 'basic' example yields a future instance and uses the data as the response body. Note that this particular rewrite is not complete, nor has it been profiled and optimized; initial benchmarks (using the 'benchmark' example) show a reduction of ~600 RSecs from the 'draft' branch, which is substantial, but hasn't been traced to a particular segment of code or design decision yet. The server is now -extremely- easy to read and follow, with all code acting in a linear way. (Application worker threading has been removed from this branch as well; the server is once again purely async.) - Alice. [1] https://github.com/pulp/marrow.server.http/tree/generator [2] https://github.com/pulp/marrow.server.http/blob/generator/examples/basic.py [3] https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py [4] https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py#L177-226 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)
On 2011-01-10 13:12:57 -0800, Guido van Rossum said: Ok, now that we've had a week of back and forth about this, let me repeat my threat. Unless more concerns are brought up in the next 24 hours, can PEP be accepted? +9001 ( 9000) It seems a lot of people are waiting for a decision that enables implementers to go ahead and claim PEP 333[3] compatibility. Django, mod_wsgi, CherryPy, etc. all have solutions that would need AFIK minor tweaking before going live, which would make adoption of PEP the fastest of any PEP I've ever seen. ;) PEP 444 can take longer. Indeed it will! :D I have the conversion from Textile to ReST about half completed; I'll continue to poke it now that mailing list traffic seems to have died down and won't be consuming the majority of my Copious Spare Time™. ReST just doesn't jive with my neural net. :/ - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Generator-Based Applications: Marrow HTTPd Example
On 2011-01-10 04:25:40 -0800, Alice Bevan–McGregor said: Note that this particular rewrite is not complete, nor has it been profiled and optimized; initial benchmarks (using the 'benchmark' example) show a reduction of ~600 RSecs from the 'draft' branch, which is substantial, but hasn't been traced to a particular segment of code or design decision yet. Ignore that number; I had some runaway processes eating up my CPU. That's what I get for going weeks or months between reboots. ;) The drop (benchmarking current 'draft' branch and 'generator' branch) is now ~200 RSecs (down from ~3.2 KRsecs). Much more reasonable, and subject to enough stddev across runs to make the difference negligible at best. *phew* - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 20:06:19 -0800, Alex Grönholm said: I liked the idea of having a separate async_read() method in wsgi.input, which would set the underlying socket in nonblocking mode and return a future. The event loop would watch the socket and read data into a buffer and trigger the callback when the given amount of data has been read. Conversely, .read() would set the socket in blocking mode. What kinds of problems would this cause? Manipulating the underlying socket is potentially dangerous (pipelining) and, in fact, not possible AFIK while being PEP444-compliant. When the request body is fully consumed, additional attempts to read _must_ return empty strings. Thus raw sockets are right out at a high level; internal to the reactor this may be possible, however. It'd be interesting to adapt marrow.io to using futures in this way as an experiment. OTOH, if you utilize callbacks extensively (as m.s.http does) you run into the problem of data passing. Your application is called (wrapped in middleware), sets up some futures and callbacks, then returns. No returned data. Middleware just got shot in the foot. The server, also, got shot in the foot. How can it get a resopnse tuple back from a callback? How can middleware be utilized? That's a weird problem to wrap my head around. Blocking the application pending the results of various socket operations is something that would have to be mandated to avoid this issue. :/ Multiple in-flight reads would also be problematic; you may end up with buffer interleaving issues. (e.g. job A reads 128 bytes at a time and has been requested to return 4KB, job B does the same... what happens to the data?) Then you begin to involve locking... Notice that my write_body method [1], writes using async, passing the iterable to the callback which is itself. This is after-the-fact (after the request has been returned) and is A-OK, though would need to be updated heavily to support the ideas of async floating around right now. I'm also extremely careful to never have multiple async callbacks pending (and thus never have muliple jobs for a single connection working at once). - Alice. [1] https://github.com/pulp/marrow.server.http/blob/draft/marrow/server/http/protocol.py#L313-332 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 19:34:41 -0800, P.J. Eby said: At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote: 09.01.2011 04:15, Alice BevanMcGregor kirjoitti: I hope that clearly identifies my idea on the subject. Since asyncservers will /already/ be implementing their own executors, I don'tsee this as too crazy. -1 on this. Those executors are meant for executing code in a threadpool. Mandating a magical socket operation filter here wouldconsiderably complicate server implementation. Actually, the *reverse* is true. If you do it the way Alice proposes, my sketches don't get any more complex, because the filtering goes in the executor facade or submit function. Indeed; the executor is what then adds the file descriptor to the underlying server async reactor (select/epoll/kqueue/other). In the case of the Marrow server, this would utilize a reactor callback (some might say deferred) to update the Future instance with the data, setting completion status, executing callbacks, etc. One might even be able to use a threading.Event (or whatever is the opposite of a lock) to wake up blocking .result() calls, even if not multi-threaded (greenthreads, etc.). Of course, adding the file descriptor to a pure async reactor then .result() blocking on it from your application would result in a deadlock; the .result() would never complete as the reactor would never get a chance to perform the pending request. (This is why Marrow requires threading be enabled globally before adding an executor to the environment; this requires rather explicit documentation.) This problem is solved completely by yielding the future instance (pausing the application) to let the reactor do its thing. (Yielding the future becomes a replacement for the blocking behaviour of future.result().) Effectively what I propose adds emulation of threading on top of async by mutating an Executor. (The Executor would be a mixed threading+async executor.) I suggest bubbling a future back up the yield stack instead of the actual result to allow the application (or middleware, or whatever happened to yield the future) to capture exceptions generated by the future'd request. Bubbling the future instance avoids excessive exception handling cruft in each middleware layer; and I see no real issue with this. AFIK, you can use a shorthand (possibly wrapped in a try: block) if all you care about is the result: data = (yield my_future).result() Truthfully, I don't really see the point of exposing the map() method (which is the only other executor method we'd expose), so it probably makes more sense to just offer a 'wsgi.submit' key... which can be a function as follows: [snip] True; the executor itself could easily be hidden behind the filter. In a multi-threaded environment, however, the map call poses no problem, and can be quite useful. (E.g. with one of my use cases for inclusion of an executor in the environment: image scaling.) Granted, this might be a rather long function. However, since it's essentially an optimization, a given server can decide how many functions can be shortcut in this way. The spec may wish to offer a guarantee or recommendation for specific methods of certain stdlib-provided types (sockets in particular) and wsgi.input. +1 Personally, I do think it might be *better* to offer extended operations on wsgi.input that could be used via yield, e.g. yield input.nb_read(). But of course then the trampoline code has torecognize those values instead of futures. Because wsgi.input is provided by the server, and the executor is provided by the server, is there a reason why these extended functions couldn't return... futures? :) Note, too, that this complexity also only affects servers that want to offer a truly async API. A synchronous server has no reason to pay particular attention to what's in a future, since it can't offer any performance improvement. I feel a sync server and async server should provide the same API for accessing the input. E.g. the application/middleware must be agnostic to the server in this regard. This is why a little bit of magic goes a long way. The following code would work on any WSGI2 stack that offers an executor (sync, async, or provided by middleware): data = (yield env['wsgi.submit'](env['wsgi.input'].read, 4096)).result() In a sync server, the blocking read would execute in another thread. In an async one appropriate actions would be taken to request a socket read from the client. Both cases pause the application pending the result. (If you don't immediately yield the future the behaviour between servers is the same!) I do think that this sort of API discussion, though, is the most dangerous part of trying to do an async spec. That is, I don'texpect that everyone will spontaneously agree on the exact same API. Alice's proposal (simply submitting object methods) has theadvantage of severely limiting the scope
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 13:16:52 -0800, P.J. Eby said: In the limit case, it appears that any WSGI 1 server could provide an (emulated) async WSGI2 implementation, simply by wrapping WSGI2 apps with a finished version of the decorator in my sketch. Or, since users could do it themselves, this would mean that WSGI2 deployment wouldn't be dependent on all server implementers immediately turning out their own WSGI2 implementations. This, if you'll pardon my language, is bloody awesome. :D That would strongly drive adoption of WSGI2. Note that adapting a WSGI1 application to WSGI2 server would likewise be very handy, and I suspect, even easier to implement. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [PEP 444] Future- and Generator-Based Async Idea
Here's what I've mutated Alex Grönholm's minimal middleware example into: (see the change history for the evolution of this) https://gist.github.com/771398 A complete functional (as in function, not working ;) async-capable middleware layer (that does nothing) is 12 lines. That, I think is a reasonable amount of boilerplate. Also, no decorators needed. It's quite readable, even the way I've compressed it. The class-based version is basically identical, but with added comments explaining the assumptions this example makes and demonstrating where the acutal middleware code can be implemented for simple middleware. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 07:04:49 -0800, exar...@twistedmatrix.com said: I think this effort would benefit from more thought on how exactly accessing this external library support will work. If async wsgi is limited to performing a single read asynchronously, then it hardly seems compelling. Apologies if the last e-mail was too harsh; I'm about to go to bed, and it' been a long night/morning. ;) Here's a proposed solution: a generator API on top of futures. If the async server implementing the executor can detect a generator being submitted, then: :: The executor accepts the generator and begins iteration (passing the executor and the arguments supplied to submit). :: The generator is expected to be /fast/. :: The generator does work until it needs an operation over a file descriptor, at which point it yields the fd and the operation (say, 'r', or 'w'). :: The executor schedules with the async reactor the generator to be re-called when the operation is possible. :: The Future is considered complete when the generator raises GeneratorExit and the first argument is used as the return value of the Future. Yielding a 2-tuple of readers/writers would work, too, and allow for more concurrent utilization of sockets, though I'm not sure of the use cases for this. If so, the generator would be woken up when any of the readers or writers are available and sent() a 2-tuple of available_readers, available_writers. The executor is passed along for any operations the generator can not accomplish safely without threads, and the executor, as it's running through the generator, will accomplish the same semantics as iterating the WSGI application: if a future instance is yielded, the generator is suspended until the future is complete, allowing heavy processing to be mixed with async calls in a fully async server. The wsgi.input operations can be implemented this way, as can database operations and pretty much anything that uses sockets, pipes, or on-disk files. In fact, the WSGI application -itself- could be called in this way (with the omission of the executor or a simple wrapper that saves the executor into the environ). Just a quick thought before running off to bed. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 09:26:19 -0800, P.J. Eby said: By the way, I don't really see the point of the new sketches you're doing... I'm sorry. ...as they aren't nearly as general as the one I've already done, but still have the same fundamental limitation: wsgi.input. You missed the point entirely, then. If wsgi.input offers any synchronous methods... Regardless of whether or not wsgi.input is implemented in an async way, wrap it in a future and eventually get around to yielding it. Problem /solved/. Identical APIs for both sync and async, and if you have an async server but haven't gotten around to implementing your own executor yet, wrapping the blocking read call in a future also solves the problem (albeit not in the most efficient way). I.e. wrap every call to a wsgi.input method by passing it to wsgi.submit. ...then they must be used from a future and must some how raise an error when called from within the application -- otherwise it would block, nullifying the point ofhaving a generator-based API. See above. No extra errors, nothing really that insane. If it offers only asynchronous methods, OTOH, then you can't pass wsgi.input to any existing libraries (e.g. the cgi module). Describe to me how a function can be suspended (other than magical greenthreads) if it does not yield; if I knew this, maybe I wouldn't be so confused. The latter problem is the worse one, because it means that the translation of an app between my original WSGI2 API and the current sketch is no longer just replace 'return' with 'yield'. I've deviated from your sketch, obviously, and any semblance of yielding a 3-tuple. Stop thinking of my example code as conforming to your ideas; it's a new idea, or, worst case, a narrowing of an idea into its simplest form. The only way this would work is if WSGI applications are still allowed to be written in a blocking style. Greenlet-based frameworks would have no problem with this, of course, but servers like Twisted would still have to run WSGI apps in a worker thread pool, just because they *might* block. Then that is not acceptable and would not work. The mechanics of yielding futures instances allows you to (in your server) implement the necessary async code however you wish while providing a uniform interface to both sync and async applications running on sync and async servers. In fact, you would be able to safely run a sync application on an async server and vice-versa. You can, on an async server: :: Add a callback to the yielded future to re-schedule the application generator. :: If using greenthreads, just block on future.result() then immediately wake up the application generator. :: Do other things I can't think of because I'm still waking up. The first solution is how Marrow HTTPd would operate. If we're okay with this as a limitation, then adding _async method variants that return futures might work, and we can proceed from there. That is not optimum, because now you have an optional API that applications who want to be compatible will need to detect and choose between. Mostly, though, it seems to me that the need to be able to write blocking code does away with most of the benefit of trying to have a single API in the first place. You have artificially created this need, ignoring the semantics of using the server-specific executor to detect async-capable requests and the yield mechanics I suggested; which happens to be a single, coherent API across sync and async servers and applications. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 09:03:38 -0800, P.J. Eby said: Hm. I'm not sure if I like that. The typical app developer really shouldn't be yielding multiple body strings in the first place. Wait; what? So you want the app developer to load a 40MB talkcast MP3 into memory before sending it? You want to completely eliminate the ability to stream an HTML page to the client in chunks (e.g. head block, headers + search box, search results, advertisements, footer -- the exact thing Google does with every search result)? That sounds like artificially restricting application developers, to me. I much prefer that the canonical example of a WSGI app just return a list with a single bytestring... Why is it wrapped in a list, then? IOW, I want it to look like the normal way to do thing is to just return the whole request at once, and use the additional difficulty of creating a second iterator to discourage people writing iterated bodies when they should just write everything to a BytesIO and be done with it. It sounds to me like your should doesn't cover an extremely large range of common use cases. In your approach, the above samples have to be rewritten as: return app(environ) [snip] My code does not use return. At all. Only yield. Try actually making some code that runs on this protocol and yields to futures during the body iteration. Sure. I'll also implement my actual proposal of not having a separate body iterable. The above middleware pattern works with the sketches I gaveon the PEAK wiki, and I've now updated the wiki to include an exampleapp and middleware for clarity. I'll need to re-read the code on your wiki; I find it incredibly difficult to grok, however, you can help me out a bit by answering a few questions about it: How does middleware trap exceptions raised by the application. (Specifically how does the server pass the buck with exceptions? And how does the exception get to the application to bubble out towards the server, through middleware, as it does now?) Really, the only hole in this approach is dealing with applications that block. That's what the executor in the environ is for. If you have image scaling or something else that will block you submit it. All networking calls? You submit them. The elephant in the room here is that while it's easy towrite these example applications so they don't block, in practicepeople read files and do database queries and what not in their requests, and those APIs are generally synchronous. So, unless they somehow fold their entire application into a future, it doesn't work. Actually, that's how multithreading support in marrow.server[.http] was implemented. Overhead? 40-60 RSecs. The option is provided for those who can do nothing about their application blocking, while still maintaining the internally async nature of the server. That you could never *call* the .read() method outside of a future,or else you would block the server, thereby obliterating the point ofhaving the async API in the first place. See above re: your confusion over the calling semantics of wsgi.input in regards to my (and Alex's) proposal. Specifically: data = (yield submit(wsgi_input.read, 4096)).result() This would work on sync and async servers, and with sync and async applications, with no difference in the code. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-09 17:06:28 -0800, Alice Bevan-McGregor said: On 2011-01-09 09:03:38 -0800, P.J. Eby said: The elephant in the room here is that while it's easy towrite these example applications so they don't block, in practicepeople read files and do database queries and what not in their requests, and those APIs are generally synchronous. So, unless they somehow fold their entire application into a future, it doesn't work. Actually, that's how multithreading support in marrow.server[.http] was implemented. Overhead? 40-60 RSecs. Clarification here, that's less than 2% of total RSecs. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] [PEP 444] Future- and Generator-Based Async Idea
Warning: this assumes we're running on bizzaro-world PEP 444 that mandates applications are generators. Please do not dismiss this idea out of hand but give it a good look and maybe some feedback. ;) -- Howdy! I've finished touching up the p-code illustrating my idea of using generators to implement async functionality within a WSGI application and middleware, including the idea of a wsgi2ref-supplied decorator to simplify middleware. https://gist.github.com/770743 There may be a few typos in there; I switched from the idea of passing back the returned value of the future to passing back the future itself in order to better handle exception handling (i.e. not requiring utter insanity in the middleware to determine the true source of an exception and the need to pass it along). The second middleware demonstration (using a decorator) makes middleware look a lot more like an application: yielding futures, or a response, with the addition of yielding an application callable not explored in the first (long, but trivial) example. I believe this should cover 99% of middleware use cases, including interactive debugging, request routing, etc. and the syntax isn't too bad, if you don't mind standardized decorators. This should be implementable within the context of Marrow HTTPd (http://bit.ly/fLfamO) without too much difficulty. As a side note, I'll be adding threading support to the server (actually, marrow.server, the underlying server/protocol abstraction m.s.http utilizes) using futures some time over the week-end by wrapping the async callback that calls the application with a call to an executor, making it immune to blocking, but I suspect the overhead will outweigh the benefit for speedy applications. Testing multi-process vs. multi-threaded using 2 workers each and the prime calculation example, threading is 1.5x slower for CPU-intensive tasks under Python 2.7. That's terrible. It should be 2x; I have 2 cores. :/ - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] [PEP 444] Future- and Generator-Based Async Idea
As a quick note, this proposal would signifigantly benefit from the simplified syntax offered by PEP 380 (Syntax for Delegating to a Subgenerator) [1] and possibly PEP 3152 (Cofunctions) [2]. The former simplifies delegation and exception passing, and the latter simplifies the async side of this. Unfortunately, AFIK, both are affected by PEP 3003 (Python Language Moratorium) [3], which kinda sucks. - Alice. [1] http://www.python.org/dev/peps/pep-0380/ [2] http://www.python.org/dev/peps/pep-3152/ [3] http://www.python.org/dev/peps/pep-3003/ ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 feature request - Futures executor
On 2011-01-07 09:47:12 -0800, Timothy Farrell said: However, I'm requesting that servers _optionally_ provide environ['wsgi.executor'] as a futures executor that applications can use for the purpose of doing something after the response is fully sent to the client. This is feature request is designed to be concurrency methodology agnostic. Done. (In terms of implementation, not updating PEP 444.) :3 The Marrow server now implements a thread pool executor using the concurrent.futures module (or equiv. futures PyPi package). The following are the commits; the changes will look bigger than they are due to cutting and pasting of several previously nested blocks of code into separate functions for use as callbacks. 100% unit test coverage is maintained (without errors), an example application is added, and the benchmark suite updated to support the definition of thread count. http://bit.ly/gUL33v http://bit.ly/gyVlgQ Testing this yourself requires Git checkouts of the marrow.server/threading branch and marrow.server.http/threading branch, and likely the latest marrow.io from Git as well: https://github.com/pulp/marrow.io https://github.com/pulp/marrow.server/tree/threaded https://github.com/pulp/marrow.server.http/tree/threaded This update has not been tested under Python 3.x yet; I'll do that shortly and push any fixes; I doubt there will be any. On 2011-01-08 03:26:28 -0800, Alice Bevan–McGregor said in the [PEP 444] Future- and Generator-Based Async Idea thread: As a side note, I'll be adding threading support to the server... but I suspect the overhead will outweigh the benefit for speedy applications. I was surprisingly quite wrong in this prediction. The following is the output of a C25 pair of benchmarks, the first not threaded, the other with 30 threads (enough so there would be no waiting). https://gist.github.com/770893 The difference is the loss of 60 RSecs out of 3280. Note that the implementation I've devised can pass the concurrent.futures executor to the WSGI application (and, in fact, does), fufilling the requirements of this discussion. :D The use of callbacks internally to the HTTP protocol makes a huge difference in overhead, I guess. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] WSGI-1 Warts
On 2011-01-08 07:22:44 -0800, David Stanek said: I'm going to take some time this weekend to create a consolidated list. I was hoping to find something like: Issue: Discussion: http:// Summary of resolution: ... I agree; that would be very, very nice to have, though it might be helpful (esp. considering the length some of these discussions go to and the mixing of ideas within single threads) to mirror the message nesting as a series of nested lists (if doing this in HTML) to more concisely collect posts vs. pointing to the head of a thread and having to go through literally everything. Of course, that's more work, and should be restricted to threads that are, in fact, scattered or unfocused. And that first sentance was waaay too long, indicating that I've now been up all night. :( - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Server-side async API implementation sketches
On 2011-01-08 17:22:44 -0800, Alex Grönholm said: On 2011-01-08 13:16:52 -0800, P.J. Eby said: I've written the sketches dealing only with PEP 3148 futures, but sockets were also proposed, and IMO there should be simple support for obtaining data from wsgi.input. I'm a bit unclear as to how this will work with async. How do you propose that an asynchronous application receives the request body? In my example https://gist.github.com/770743 (which has been simplified greatly by P.J. Eby in the Future- and Generator-Based Async Idea thread) for dealing with wsgi.input, I have: future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096) yield future While ugly, if you were doing this, you'd likely: submit = environ['wsgi.executor'].submit input_ = environ['wsgi.input'] future = yield submit(input_.read, 4096) data = future. That's a bit nicer to read, and simplifies things if you need to make a number of async calls. The idea here is that: :: Your async server subclasses ThreadPoolExecutor. :: The subclass overloads the submit method. :: Your submit method detects bound methods on wsgi.input, sockets, and files. :: If one of the above is detected, create a mock future that defines 'fd' and 'operation' attributes or similar. :: When yielding the mock future, your async reactor can detect 'fd' and do the appropriate thing for your async framework. (Generally adding the fd to the appropriate select/epoll/kqueue readers/writers lists.) :: When the condition is met, set_running_or_notify_cancel (when internally reading or writing data), set_result, saving the value, and return the future (filled with its data) back up to the application. :: The application accepts the future instance as the return value of yield, and calls result across it to get the data. (Obviously writes, if allowed, won't have data, but reads will.) I hope that clearly identifies my idea on the subject. Since async servers will /already/ be implementing their own executors, I don't see this as too crazy. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-06 20:49:57 -0800, P.J. Eby said: It would be helpful if you addressed the issue of scope, i.e., whatfeatures are you proposing to offer to the application developer. Conformity, predictability, and portability. That's a lot of y's. (Pardon the pun!) Alex Grönholm's post describes the goal quite clearly. So far, I believe you're the second major proponent (i.e. ones with concrete proposals and/or implementations to discuss) of an async protocol... and what you have in common with the other proponent is that you happen to have written an async server that would benefit from having apps operating asynchronously. ;-) Well, the Marrow HTTPd does operate in multi-process mode, and, one day, multi-threaded or a combination. Integration of a futures executor to the WSGI environment would alleviate the major need for a multi-threaded implementation in the server core; intensive tasks can be deferred to a thread pool vs. everything being deferred to a thread pool. (E.g. template generation, PDF/other text extraction for indexing of file uploads, image scaling, etc. all of which are real use cases I have which would benefit from futures.) I find it hard to imagine an app developer wanting to do something asynchronously for which they would not want to use one of the big-dog asynchronous frameworks. (Especially if their app involves database access, or other communications protocols.) Admittedly, a truly async server needs some way to allow file descriptors to be registered with the reactor core, with the WSGI application being resumed upon some event (e.g. socket is readable or writeable for DB access, or even pipe operations for use cases I can't think of at the moment). Futures integration is a Good Idea, IMHO, and being optional and easily added to the environ by middleware for servers that don't implement it natively is even better. As for how to provide a generic interface to an async core, I have two ideas, but one is magical and the other is more so; I'll describe these in a descrete post. This doesn't mean I think having a futures API is a bad thing, butISTM that a futures extension to WSGI 1 could be defined right nowusing an x-wsgi-org extension in that case... and you could thenfind out how many people are actually interested in using it. I'll add writing up a WSGI middleware layer that configures and adds a future.executor to the environ to my already overweight to-do list. It actually is something I have a use for right now on at least one commercial project. :) Mainly, though, what I see is people using the futures thing to shuffle off compute-intensive tasks... That's what it's for. ;) ...but if they do that, then they're basically trying to make the server's life easier... but under the existing spec, any truly async server implementing WSGI is going to run the *app* in a future of some sort already... Running the application in a future is actually not a half-bad way for me to add threading to marrow.server... thanks! Which means that the net result is that putting in async is like saying to the app developer: hey, you know this thing that you just could do in WSGI 1 and the server would take care of it foryou? Well, now you can manage that complexity by yourself! Isn't that wonderful? ;-) That's a bit extreme; PEP 444 servers may still implement threading, multi-processing, etc. at the reactor level (a la CherryPy or Paste). Giving WSGI applications access to a futures executor (possibly the one powering the main processing threads) simply gives applications the ability to utilize it, not the requirement to do so. I could be wrong of course, but I'd like to see what concrete usecases people have for async. Earlier in this post I illustrated a few that directly apply to a commercial application I am currently writing. I'll elaborate: :: Image scaling would benefit from multi-processing (spreading the load across cores). Also, only one sacle is immediately required before returning the post-upload page: the thumbnail. The other scales can be executed without halting the WSGI application's return. :: Asset content extraction and indexing would benefit from threading, and would also not require pausing the WSGI application. :: Since most templating engines aren't streaming (see my unanswered thread in the general mailing list re: this), pausing the application pending a particularly difficult render is a boon to single-threaded async servers, though true streaming templating (with flush semantics) would be the holy grail. ;) :: Long-duration calls to non-async-aware libraries such as DB access. The WSGI application could queue up a number of long DB queries, pass the futures instances to the template, and the template could then .result() (block) across them or yield them to be suspended and resumed when the result is available. :: True async is useful for WebSockets,
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-06 10:15:19 -0800, Antoine Pitrou said: Alice Bevan–McGregor al...@... writes: Er, for the record, in Python 3 non-blocking file objects return None when read() would block. -1 I'm aware, however that's not practically useful. How would you detect from within the WSGI 2 application that the file object has become readable? Implement your own async reactor / select / epoll loop? That's crazy talk! ;) I was just pointing out that if you need to choose a convention for signaling blocking reads on a non-blocking object, it's already there. I don't. I need a way to suspend execution of a WSGI application pending some operation, often waiting for socket or file read or write availability. (Just as often something entirely unrelated to file descriptors, see my previous post from a few moments ago.) By the way, an event loop is the canonical implementation of asynchronous programming, so I'm not sure what you're complaining about. Or perhaps you're using async in a different meaning? (which one?) If you use non-blocking sockets, and the WSGI server provides a way to directly access the client socket (ack!), utilizing the none response on reads would require you to utilize a tight loop within your application to wait for actual data. That's really, really bad, and in a single-threaded server, deadly. I don't understand why you want a yield at this level. IMHO, WSGI needn't involve generators. A higher-level wrapper (framework, middleware, whatever) can wrap fd-waiting in fancy generator stuff if so desired. Or, in some other environments, delegate it to a reactor with callbacks and deferreds. Or whatever else, such as futures. WSGI already involves generators: the response body. In fact, the templating engine I wrote (and extended to support flush semantics) utilizes a generator to return the response body. Works like a hot damn, too. Yield is the Python language's native way to suspend execution of a callable in a re-entrant way. A trivial example of this is an async ping-pong reactor. I wrote one (you aren't a real Python programmer unless...) as an experiment and utilize it for server monitoring with tasks being generally scheduled against time, vs. edge-triggered or level-triggered fd operation availability. Everyone has their own idea of what a deferred is, and there is only one definition of a future, which (in a broad sense) is the same as the general idea of a deferred. Deferreds just happen to be implementation-specific and often require rewriting large portions of external libraries to make them compatible with that specific deferred implementation. That's not a good thing. Hell; an extension to the futures spec to handle file descriptor events might not be a half-bad idea. :/ By the way, the concurrent.futures module is new. Though it will be there in 3.2, it's not guaranteed that its API and semantics will be 100% stable while people start to really flesh it out. Ratification of PEP 444 is a long way off itself. Also, Alex Grönholm maintains a pypi backport of the futures module compatible with 2.x+ (not sure of the specific minimum version) and 3.2. I'm fairly certain deprecation warnings wouldn't kill the usefulness of that implementation. Worrying about instability, at this point, may be premature. +1 for pure futures which (in theory) eliminate the need for dedicated async versions of absolutely everything at the possible cost of slightly higher overhead. I don't understand why futures would solve the need for a low-level async facility. You mis-interpreted; I didn't mean to infer that futures would replace an async core reactor, just that long-running external library calls could be trivially deferred using futures. You still need to define a way for the server and the app to wake each other (and for the server to wake multiple apps). Futures is a pretty convienent way to have a server wake an app; using a future completion callback wrapped (using partial) with the paused application generator would do it. (The reactor Marrow uses, a modified Tornado IOLoop, would require calling reactor.add_callback(partial(worker, app_gen)) followed by reactor._wake() in the future callback.) Waking up the server would be accomplished by yielding a futures instance (or fd magical value, etc). This isn't done naturally in Python (except perhaps with stackless or greenlets). Using fds give you well-known flexible possibilities. Yield is the natural way for one side of that, re-entering the generator on future completion covers the other side. Stackless and greenlets are alternate ideas, but yield is built-in (and soon, so will futures). If you want to put the futures API in WSGI, think of the poor authors of a WSGI server written in C who will have to write their own executor and future implementation. I'm sure they have better things to do. If they embed a Python interpreter via C
Re: [Web-SIG] PEP 444 Goals
On 2011-01-06 20:18:12 -0800, P.J. Eby said: :: Reduction of re-implementation / NIH syndrome by incorporatingthe most common (1%) of features most often relegated to middlewareor functional helpers. Note that nearly every application-friendly feature you add will increase the burden on both server developers and middleware developers, which ironically means that application developers actually end up with fewer options. Some things shouldn't have multiple options in the first place. ;) I definitely consider implementation overhead on server, middleware, and application authors to be important. As an example, if yield syntax is allowable for application objects (as it is for response bodies) middleware will need to iterate over the application, yielding up-stream anything that isn't a 3-tuple. When it encounters a 3-tuple, the middleware can do its thing. If the app yield semantics are required (which may be a good idea for consistency and simplicity sake if we head down this path) then async-aware middleware can be implemented as a generator regardless of the downstream (wrapped) application's implementation. That's not too much overhead, IMHO. Unicode decoding of a small handful of values (CGI values that pull from the request URI) is the biggest example. [2, 3] Does that mean you plan to make the other values bytes, then? Or will they be unicode-y-bytes as well? Specific CGI values are bytes (one, I believe), specific ones are true unicode (URI-related values) and decoded using a configurable encoding with a fallback to bytes in unicode (iso-8859-1/latin1), are kept internally consistent (if any one fails, treat as if they all failed), have the encoding used recorded in the environ, and all others are native strings (bytes in unicode where native strings are unicode). What happens for additional server-provided variables? That is the domain of the server to document, though native strings would be nice. (The PEP only covers CGI variables.) The PEP choice was for uniformity. At one point, I advocated simply using surrogateescape coding, but this couldn't be made uniform across Python versions and maintain compatibility. As an open question to anyone: is surrogateescape availabe in Python 2.6? Mandating that as a minimum version for PEP 444 has yielded benefits in terms of back-ported features and syntax, like b''. :: Cross-compatibility considerations. The definition and use ofnative strings vs. byte strings is the biggest example of this in the rewrite. I'm not sure what you mean here. Do you mean portability of WSGI 2code samples across Python versions (esp. 2.x vs. 3.x)? It should be possible (and currently is, as demonstrated by marrow.server.http) to create a polygot server, polygot middleware/filters (demonstrated by marrow.wsgi.egress.compression), and polygot applications, though obviously polygot code demands the lowest common denominator in terms of feature use. Application / framework authors would likely create Python 3 specific WSGI applications to make use of the full Python 3 feature set, with cross-compatibility relegated to server and middleware authors. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 Goals
On 2011-01-07 01:08:42 -0800, chris.dent said: ... this particular goal [reduction of reimplementation / NIH] could cover a large number of things from standardized query string processing (maybe a good idea) to filters (which I've already expressed reservations about). So this goal seems like it ought to be several separate goals. +1 This definitely needs to be broken out to be explicit over the things that can be abstracted away from middleware and applications. Input from framework authors would be valuable here to see what they disliked re-implementing the most. ;) Query string processing is a difficult task at the best of times, and is one area that is reimplemented absolutely everywhere. (At some point I should add up the amount of code + unit testing code that covers this topic alone from the top 10 frameworks.) The other option (than non-optional) for optional things is to remove them. True; though optional things already exist as if they were not there. Implementors rarely, it seems, expend the effort to implement optional components, thus every HTTP server I came across having comments in the code saying up to the application to implement chunked responses indicating -some- thought, but despite chunked /request/ support being mandated by HTTP/1.1. (And other ignored requirements.) - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 feature request - Futures executor
On Fri, Jan 7, 2011 at 9:47 AM, Timothy Farrell wrote: There has been much discussion about how to handle async in PEP 444 and that discussion centers around the use of futures. However, I'm requesting that servers _optionally_ provide environ['wsgi.executor'] as a futures executor that applications can use for the purpose of doing something after the response is fully sent to the client. This is feature request is designed to be concurrency methodology agnostic. +1 On 2011-01-07 11:07:36 -0800, Timothy Farrell said: On 2011-01-07 09:59:10 -0800, Guido van Rossum said: If it's optional, what's the benefit for the app of getting it through WSGI instead of through importing some other standard module? Aside from that, servers currently specify if they are multi-threaded and/or multi-process. Having the server provide the executor allows it to provide an executor that most matches its own concurrency model... I think that's the bigger point; WSGI servers do implement their own concurrency model for request processing and utilizing a server-provided executor which interfaces with whatever the internal representation of concurrency is would be highly beneficial. (Vs. an application utilizing a more generic executor implementation that adds a second thread pool...) Taking futures to be separate and distinct from the rest of async discussion, I still think it's an extremely useful feature. I outlined my own personal use cases in my slew of e-mails last night, and many of them are also not time sensitive. (E.g. image scaling, full text indexing, etc.) Maybe this should be a server option instead of a spec option. It would definitely fall under the Server API spec, not the application one. Being optional, and with simple (wsgi.executor) access via the environ would also allow middleware developers to create executor implementations (or just reference the concurrent.futures implementation). I worry that this weighs down the WSGI standard with the responsibility of coming up with the perfect executor API, and if it's not quite perfect after all, servers are additionally required to support the standard but suboptimal API effectively forever. I'm not following you here. What's wrong with executor.submit() that might need changing? Granted, it would not be ideal if an application called executor.shutdown(). This doesn't seem difficult to my tiny brain. The perfect executor API is already well defined in PEP 3148 AFIK. Specific methods with specific semantics implemented in a duck-typed way. The underlying implementation is up to the server, or the server can utilize an external (or built-in in 3.2) futures implementation. If WSGI 2 were to incorporate futures as a feature there would have to be some mandate as to which methods applications and middleware are allowed to call; similar to how we do not allow .close() across wsgi.input or wsgi.errors. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 09:04:07 -0800, Antoine Pitrou said: Alice Bevan–McGregor al...@... writes: I don't understand why you want a yield at this level. IMHO, WSGI needn't involve generators. A higher-level wrapper (framework, middleware, whatever) can wrap fd-waiting in fancy generator stuff if so desired. Or, in some other environments, delegate it to a reactor with callbacks and deferreds. Or whatever else, such as futures. WSGI already involves generators: the response body. Wrong. I'm aware that it can be any form of iterable, from a list-wrapped string all the way up to generators or other nifty things. I mistakenly omitted these assuming that the other iterables were universally understood and implied. However, using a generator is a known, vlaid use case that I do see in the wild. (And also rely upon in some of my own applications.) Right, that's why I was suggesting you drop your concern for Python 2 compatibility. -1 There is practically no reason for doing so; esp. considering that I've managed to write a 2k/3k polygot server that is more performant out of the box than any other WSGI HTTP server I've come across and is far simpler in implementation than most of the ones I've come across with roughly equivelant feature sets. Cross compatibility really isn't that hard, and arguing that 2.x support should be dropped for the sole reason that it might be dead by the time this is ratified is a bit off. Python 2.x will be around for a long time. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 08:10:43 -0800, P.J. Eby said: At 12:39 AM 1/7/2011 -0800, Alice BevanMcGregor wrote: :: Image scaling would benefit from multi-processing (spreading theload across cores). Also, only one sacle is immediately requiredbefore returning the post-upload page: the thumbnail. The otherscales can be executed without halting the WSGI application's return. :: Asset content extraction and indexing would benefit fromthreading, and would also not require pausing the WSGI application. In all these cases, ISTM the benefit is the same if you future theWSGI apps themselves (which is essentially what most current asyncWSGI servers do, AFAIK). Image scaling and asset content extraction should not block the response to a HTTP request; these need to be 'forked' from the main request. Only template generation (where the app needs to effectively block pending completion) is solved easily by threading the whole application call. :: Long-duration calls to non-async-aware libraries such as DB access. The WSGI application could queue up a number of long DB queries,pass the futures instances to the template, and the template couldthen .result() (block) across them or yield them to be suspended andresumed when the result is available. :: True async is useful for WebSockets, which seem a far superiorsolution to JSON/AJAX polling in addition to allowing real web-basedsocket access, of course. The point as it relates to WSGI, though, is that there are plenty ofmature async APIs that offer these benefits, and some of them (e.g.Eventlet and Gevent) do so while allowing blocking-style code to bewritten. That is, you just make what looks like a blocking call, butthe underlying framework silently suspends your code, without tyingup the thread. Or, if you can't use a greenlet-based framework, you can use a yield-based framework. Or, if for some reason you really wanted to write continuation-passing style code, you could just use the raw Twisted API. But is there really any problem with providing a unified method for indication a suspend point? What the server does when it gets the yielded value is entirely up to the implementation of the server; if it (the server) wants to use greenlets, it can. If it has other methedologies, it can go nuts. Even if you've already written a bunch of code using raw sockets and want to make it asynchronous, Eventlet and Gevent actually let youload a compatibility module that makes it all work, by replacing the socket API with an exact duplicate that secretly suspends your code whenever a socket operation would block. I generally frown upon magic, and each of these implementations is completely specific. :/ - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 12:42:24 -0800, Paul Davis said: Is the code for this server online? I'd be interested in reading through it. https://github.com/pulp/marrow.server.http There are two branches: master will always refer to the version published on Python.org, and draft refers to my rewrite. (When published, draft will be merged.) - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 13:21:36 -0800, Antoine Pitrou said: Ok, so, WSGI doesn't already involve generators. QED. This can go around in circles; by allowing all forms of iterable, it involves generators. Geneators are a type of iterable. QED right back. ;) Right, that's why I was suggesting you drop your concern for Python 2 compatibility. -1 There is practically no reason for doing so; Of course, there is one: a less complex PEP without any superfluous compatibility language sprinkled all over. There isn't any compatibility language sprinkled within the PEP. In fact, the only mention of it is in the introduction (stating that 2.6 support may be possible but is undefined) and the title of a section Python Cross-Version Compatibility. Using native strings where possible encourages compatibility, though for the environ variables previously mentioned (URI, etc.) explicit exceptional behaviour is clearly defined. (Byte strings and true unicode.) Just because you managed to write some piece of code for a *particular* use case doesn't mean that cross-compatibility is a solved problem. The particular use case happens to be PEP 444 as implemented using an async and multi-process (some day multi-threaded) HTTP server, so I'm not quite sure what you're getting at, here. I think that use case is sufficiently broad to be able to make claims about the ease of implementing PEP 444 in a compatible way. If you think it's easy, then I'm sure the authors of various 3rd-party libs would welcome your help achieving it. I helped proof a book about Python 3 compatibility and am giving a presentation in March that contains information on Python 3 compatibility from the viewpoint of implementing the Marrow suite. Python 2.x will be around for a long time. And so will PEP and even PEP 333. People who value legacy compatibility will favour these old PEPs over your new one anyway. People who don't will progressively jump to 3.x. Yup. Not sure how this is really an issue. PEP 444 is the /future/, 333[3] is /now/ [-ish]. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 09:04:07 -0800, Antoine Pitrou said: WSGI doesn't mandate any specific feature of generators, such as coroutine-like semantics, and the server doesn't have to know about them. The joy of writing a new specification is that we are not (potentially) shackled by old ways of doing things. Case in point: dropping start_response and changing the return value. PEP 444 isn't WSGI 1, and can change things, including additional changes to the allowable return value. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 Goals
On 2011-01-07 20:34:09 -0800, P.J. Eby said: That it [handling generators] is difficult at all means removes degree-of-difficulty as a strong motivation to switch. Agreed. I will be following up with a more concrete idea (including p-code) to better describe what is currently in my brain. (One half of which will be just as objectionable, the other half, with Alex Grönholm's input, far more reasonable.) IOW, there are six specific facts someone needs to remember in orderto know the type of a given CGI variable, over and above the merefact that it's a CGI variable. Hence, reference. No, practically there is one. If you are implementing a Python 3 solution, a single value (original URI) is an instance of bytes, the rest are str. If you are implementing a Python 2 solution, there's a single rule you need to remember: values derived from the URI (QUERY_STRING, PATH_INFO, etc.) are unicode, the rest are str. Poloygot implementors are already accepting that they will need to include more in their headspace before writing a single line of code; knowing that native string differs between the two langauges is a fundamental concept nessicary for the act of writing polygot code. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 22:13:17 -0800, Alex Grönholm said: 08.01.2011 07:09, P.J. Eby wrote: On the plus side, the run this in a future after the request concept has some legs... [snip] What exactly does run this in a future after the request mean? There seems to be some terminology confusion here. I suspect he's referring to some of the notes on the PEP 444 feature request - Futures executor thread and several of my illustrated use cases, notably: :: Image scaling (e.g. to multiple sizes) after uploading of an image to be scaled where the response (Congratulations, image uploded!) does not require the result of the scaling. :: Content indexing which can also be performed after returning the success page. The former would executor.submit() a number of scaling jobs, attach completion callbacks to perform some cleanup / database updating / etc., and return a response immediately. The latter is a single executor submission that is entirely non-time-critical. And likely other use cases as well. This (inclusion of an executor tuned to the underlying server in the environment) is one thing I think we can (almost) all agree is a good idea. :D Discussion on that particular idea should be relegated to the feature request thread, though. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-07 13:21:36 -0800, Antoine Pitrou said: Ok, so, WSGI doesn't already involve generators. QED. Let me try this again. With the understanding that: :: PEP 333[3] and 444 define a response body as an iterable. :: Thus WSGI involves iterables through definition. :: A generator is a type of iterable. :: Thus WSGI involves generators through the use of iterables. The hypothetical redefinition of an application as a generator is not too far out to lunch, considering that WSGI _already involves generators_. (And that the simple case, an application that does not utilize async, will require a single word be changed: s/return/yield) Is that clearer? The idea refered to below (and posted separately) involve this redefinition, which I understand fully will have a number of strong opponents. Considering PEP 444 is a new spec (already breaking direct compatibility via the /already/ redefined return value) I hope people do not reject this out of hand but instead help explore the idea further. On 2011-01-07 19:36:52 -0800, Antoine Pitrou said: Alice Bevan–McGregor al...@... writes: The particular use case happens to be PEP 444 as implemented using an async and multi-process (some day multi-threaded) HTTP server, so I'm not quite sure what you're getting at, here. It's becoming to difficult to parse. You aren't sure yet what the async part of PEP 444 should look like but you have already implemented it? Marrow HTTPd (marrow.server.http) [1] is, internally, an asynchronous server. It does not currently expose the reactor to the WSGI application via any interface whatsoever. I am, however, working on some p-code examples (that I will post for discussion as mentioned above) which I can base a fork of m.s.http off of to experiment. This means that, yes, I'm not sure how async will work in PEP 444 /in the end/, but I am at least attempting to explore the practical implications of the ideas thus far in a real codebase. I'm getting it done, even if it has to change or be scrapped. I helped proof a book about Python 3 compatibility and am giving a presentation in March that contains information on Python 3 compatibility from the viewpoint of implementing the Marrow suite. Well, I hope not too many people will waste time trying to write code cross-compatible code rather than solely target Python 3. The whole point of Python 3 is to make developers' life better, not worse. I agree, with one correction to your first point. Application and framework developers should whole-heartedly embrase Python 3 and make full use of its many features, simplifications and clarifications. However, it is demonstrably not Insanely Difficult™ to have compatible server and middleware implementations with the draft's definition of native string. If server and middleware developers are willing to create polygot code, I'm not going to stop them. Note that this type of compatibility is not mandated, and the use of native strings (with one well defined byte string exception) means that pure Python 3 programmers can be blissfully ignorant of the compatibility implications -- everything else is unicode (str), even if it's just bytes-in-unicode (latin1/iso-8859-1). Pure Python 2 programmers have only a small difference (for them) of the URI values being unicode; the remaining values are byte strings (str). I would like to hear a technical reason why this (native strings) is a bad idea instead of vague this will make things harder -- it won't, at least, not measurably, and I have the proof as a working, 100% unit tested, performant, cross-compatible polygot HTTP/1.1-compliant server. Written in several days worth of full-time work spread across weeks because this is a spare-time project; i.e. not a lot of literal work, nor hard. Hell, it has transformed from a crappy hack to experiment with HTTP into a complete (or very nearly so) implementation of PEP 444 in both of its current forms (published and draft) that is almost usable, ignoring the fact that PEP 444 is mutable, of course. - Alice. [1] http://bit.ly/fLfamO ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-06 03:53:14 -0800, Antoine Pitrou said: Alice Bevan-McGregor al...@... writes: GothAlice: ... native string usage, the definition of byte string as the format returned by socket read (which, on Java, is unicode!) ... Just so no-one feels the need to correct me; agronholm made sure I didn't drink the kool-aid of one article I was reading and basing some ideas on. Java socket ojects us byte-based buffers, not unicode. My bad! Regardless of the rest, I think the latter would be a large step backwards. Clear distinction between bytes and unicode is a *feature* of Python 3. Unicode-ignorant programmers should use frameworks which do the encoding work for them. +0.5 I'm beginning to agree; with the advent of b'' syntax in 2.6, the only compelling reason to include this feature (examples that work without modification across major versions of Python) goes up in smoke. The examples should use the b'' syntax and have done with it. (by the way, why you are targeting both Python 2 and 3?) For the same reason that Python 3 features are introduced to 2.x; migration. Users are more likely to adopt something that doesn't require them to change production environments, and 3.x is far away from being deployed in production anywhere but on Gentoo, it seems. ;) Broad development and deployment options are a Good Thing™, and with b'', there is no reason -not- to target 2.6+. (There is no requirement that a PEP 444 / WSGI 2 server even try to be a cross-compatible polygot; there is room for 2.x-specific and 3.x-specific solutions, and, in theory, it should be possible to support Python 2.6, I just don't feel it's worthwhile to lock your application into Very Old™ interpreters.) agronholm: I'm not very comfortable with the idea of wsgi.input in async apps \ I'm just thinking what would happen when you do environ['wsgi.input'].read() GothAlice: One of two things: in a sync environment, it blocks until it can read, in an async environment [combined with yield] it pauses/shelves your application until the data is available. Er, for the record, in Python 3 non-blocking file objects return None when read() would block. -1 I'm aware, however that's not practically useful. How would you detect from within the WSGI 2 application that the file object has become readable? Implement your own async reactor / select / epoll loop? That's crazy talk! ;) agronholm: the requirements of async apps are a big problem agronholm: returning magic values from the app sounds like a bad idea agronholm: the best solution I can come up with is to have wsgi.async_input or something, which returns an async token for any given read operation The idiomatic abstraction for non-blockingness under POSIX is file descriptors. So, at the low level (the WSGI level), exchanging fds between server and app could be enough to allow both to wake up each other (perhaps two fds: one the server can wait on, one the app can wait on). Similarly to what signalfd() does. Then higher-level tools can wrap inside Futures or whatever else. -0 Hmm; I'll have to mull that over. Initial thoughts: having a magic yield value that combines a fd and operation (read/write) is too magical. However, this also means Windows compatibility becomes more complicated, unless the fds are sockets. +1 for pure futures which (in theory) eliminate the need for dedicated async versions of absolutely everything at the possible cost of slightly higher overhead. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
Chris, On 2011-01-06 05:03:15 -0800, Chris Dent said: On Wed, 5 Jan 2011, Alice Bevan–McGregor wrote: This should give a fairly comprehensive explanation of the rationale behind some decisions in the rewrite; a version of these conversations (in narrative style vs. discussion) will be added to the rewrite Real Soon Now™ under the Rationale section. Thanks for this. I've been trying to follow along with this conversation as an interested WSGI app developer and admit that much of the thrust of things is getting lost in the details and people's tendency to overquote. Yeah; I knew the IRC log dump was only so useful. It's a lot of material to go through, and much of it was discussed at strange hours with little sleep. ;) One thing that would be useful is if, when you post, Alice, you could give the URL of whatever and wherever your current draft is. Tomorrow (ack, today!) I'll finish converting over the PEP from Textile to ReStructuredText and get it re-submitted to the Python website. https://github.com/GothAlice/wsgi2/blob/master/pep444.textile http://www.python.org/dev/peps/pep-0444/ I don't use frameworks, or webob or any of that stuff. I just cook up callables that take environ and start_response. I don't want my awareness of the basics of HTTP abstracted away, because I want to make sure that my apps behave well. Kudos! That approach is heavily frowned upon in the #python IRC channel, but I fully agree that working solutions can be reasonably made using that methedology. There are some details that are made easier by frameworks, though. Testing benefits from MVC: you can test the dict return value of the controller, the templates, and the model all separately. Plain WSGI is a good thing, for me, because it means that my applications are a) very webby (in the stateless HTTP sense) and b) very testable. c) And very portable. You need not depend on some pre-arranged stack (including web server). I agree with some others who have suggested that maybe async should be its own thing, rather than integrated into a WSGI2. A server could choose to be WSGI2 compliant or AWSGI compliant, or both. -1 That is already the case with filters, and will be when I ratify the async idea (after further discussion here). My current thought process is that async will be optional for server implementors and will be easily detectable by applications and middleware and have zero impact on middleware/applications if disabled (by configuration) or missing. That said I can understand why an app author might like to be able to read or write in an async way, and being able to shelf an app to wait around for the next cycle would be a good thing. Using futures, async covers any callable at all; you can queue up a dozen DB calls at the top of your application, then (within a body generator) yield those futures to be paused pending the data. That would, as an example, allow complex pages to be generated and streamed to the end-user in a efficient way -- the user would see a page begin to appear, and the browser downloading static resources, while intensive tasks complete. I just don't want efforts to make that possible to make writing a boring wsgi thing more annoying. +9001 See above. I can't get my head around filters yet. They sound like a different way to do middleware, with a justification of something along the lines of I don't like middleware for filtering. I'd like to be (directly) pointed at a more robust justification. I suspect you have already pointed at such a thing, but it is lost in the sands of time... Filters offer several benefits, some of which are mild: :: Simplified application / middleware debugging via smaller stack. :: Clearly defined tasks; ingress = altering the environ / input, egress = altering the output. :: Egress filters are not executed if an unhandled exception is raised. The latter point is important; you do not want badly written middleware to absorb exceptions that should bubble, etc. (I'll need to elaborate on this and add a few more points when I get some sleep.) Filters seem like something that could be added via a standardized piece of middleware, rather than being part of the spec. I like minimal specs. Filters are optional, and an example is/will be provided for utilizing ingress/egress filter stacks as middleware. The problem with /not/ including the filtering API (which, by itself is stupidly simple and would barely warrant its own PEP, IMHO) is that a separate standard would not be seen and taken into consideration when developers are writing what they will think /must/ be middleware. Seing as a middleware version of a filter is trivial to create (just execute the filter in a thin middleware wrapper), it should be a consideration up front. Latin1 = \u → \u00FF [snip] There's a rule of thumb about constraints. If you must constrain, do none, one or all, never
Re: [Web-SIG] PEP 444 / WSGI 2 Async
On 2011-01-06 09:06:10 -0800, chris.d...@gmail.com said: I wasn't actually talking about the log dump. That was useful. What I was talking about were earlier messages in the thread where people were making responses, quoting vast swaths of text for no clear reason. Ah. :) I do make an effort to trim quoted text to only the relevant parts. On Thu, 6 Jan 2011, Alice Bevan–McGregor wrote: https://github.com/GothAlice/wsgi2/blob/master/pep444.textile Thanks, watching that now. The textile document will no longer be updated; the pep-444.rst document is where it'll be at. I should have been more explicit here as I now feel I must defend myself from frowns. I'm not talking about single methods that do the entire app. I nest a series of middleware that bottom out at Selector which then does url based dispatch to applications, which themselves are defined as handlers (simple wsgi functions) and access StorageInterfaces and Serializations. The middleware, handlers, stores and serializers are all independently testable (and usable). *nods* My framework (WebCore) is basically a packaged up version of a custom middleware stack so I can easily re-use it from project to project. I assumed (in my head) you were rolling your own framework/stack. That is already the case with filters, and will be when I ratify the async idea (after further discussion here). My current thought process is that async will be optional for server implementors and will be easily detectable by applications and middleware and have zero impact on middleware/applications if disabled (by configuration) or missing. This notion of being detectable seems weird to me. Are we actually expecting an application to query the server, find out it is not async capable, and choose a different code path as a result? Seems much more likely that the installer will choose a server or app that meets their needs. That is: you don't need to detect, you need to know (presumably at install/config time). Or maybe I am imagining the use cases incorrectly here. I think of app being async as an explicit choice made by the builder to achieve some goal. More to the point it needs to be detectable by middleware without explicitly configuring every layer of middleware, potentially with differing configuration mechanics and semantics. (I.e. arguments like enable_async, async_enable, iLoveAsync, ...) I can't get my head around filters yet.[snip] Filters offer several benefits, some of which are mild: :: Simplified application / middleware debugging via smaller stack. :: Clearly defined tasks; ingress = altering the environ / input, egress = altering the output. :: Egress filters are not executed if an unhandled exception is raised. Taken individually none of these seem super critical to me. Or to put it another way: Yeah, so? (This is the aforementioned resistance showing through. The above sounds perfectly nice, reasonable and desireable, but not _necessary_.) It isn't necessary; it is, however, an often re-implemented feature of a framework on top of WSGI. CherryPy, Paste, Django, etc. all implement some form of non-WSGI (or, hell, Paste uses WSGI middleware) thing they call a 'filter'. Filters are optional, and an example is/will be provided for utilizing ingress/egress filter stacks as middleware. In a conversation with some people about the Atom Publishing Protocol I tried to convince them that the terms SHOULD and MAY had no place in a spec. WSGI* is not really the same kind of spec, but optionality still grates in the same way. I fully agree; that's why a lot of the PEP 333 optionally or may features have become must. Optionally and may simply never get implemented. Filters are optional because a number of people have raised valid arguments that it might not be entirely needed. Thus, it's not required. But I strongly feel that some defined API should be present in (or /at least/ referred to by) the PEP, otherwise the future will hold the same server-specific incompatible implementations. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 Goals
On 2011-01-06 13:06:36 -0800, James Y Knight said: On Jan 6, 2011, at 3:52 PM, Alice Bevan–McGregor wrote: :: Making optional (and thus rarely-implemented) features non-optional. E.g. server support for HTTP/1.1 with clarifications for interfacing applications to 1.1 servers. Thus pipelining, chunked encoding, et. al. as per the HTTP 1.1 RFC. Requirements on the HTTP compliance of the server don't really have any place in the WSGI spec. You should be able to be WSGI compliant even if you don't use the HTTP transport at all (e.g. maybe you just send around requests via SCGI). The original spec got this right: chunking etc are something which is not relevant to the wsgi application code -- it is up to the server to implement the HTTP transport according to the HTTP spec, if it's purporting to be an HTTP server. Chunking is actually quite relevant to the specification, as WSGI and PEP 444 / WSGI 2 (damn, that's getting tedious to keep dual-typing ;) allow for chunked bodies regardless of higher-level support for chunking. The body iterator. Previously you /had/ to define a length, with chunked encoding at the server level, you don't. I agree, however, that not all gateways will be able to implement the relevant HTTP/1.1 features. FastCGI does, SCGI after a quick Google search, seems to support it as well. I should re-word it as: For those servers capable of HTTP/1.1 features the implementation of such features is required. +1 - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] WSGI Middleware Dependancy Graphing (was: PEP 444 / WSGI 2 Async)
On 2011-01-06 13:08:04 -0800, Robert Brewer said: Or, if you had actually read what I wrote weeks ago... I did. Apologies for forgetting the detail of the implementation being deprecated. We don't need Yet Another Way of hooking in processing components; if anything, we need a standard mechanism to compose existing middleware graphs so that invariant orderings are explicit and guaranteed. For example, encode, then gzip, then cache. By introducing egress filters as described in PEP 444 (which mentions gzip as a candidate for an egress filter), you're then stuck in a tug-of-war as to whether to build a new caching component as middleware, as an egress filter, or (most likely, in order to compete) both. I do, in fact, have a proposal for declaring dependancies, however such declaration is utterly useless unless differing middleware-based implementations (e.g. sessions) can agree on a common API for their feature sets. I feel strongly that this idea does not belong in PEP 444; it's one of the few things I think should be its own PEP. My mechanism (for which I do have a working implementation against WSGI 1; my web framework uses it) involves middleware layers declaring several attributes on themselves: provides - abstract API names uses - ordering hint, no dependancy needs - die if dependancy is not met before - explicit ordering, including * after - explicit ordering, including * For this to really work, however, it'd also need either an entrypoint-based way of looking up components (making the graph truly dynamic), or it needs to be combined with explicit packages a la setuptools.require. In that instance, you've already done the ordering yourself, so dependancy graphing is moot. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 Goals
On 2011-01-06 14:14:32 -0800, Alice Bevan–McGregor said: There was something, somewhere I was reading related to WSGI about requiring content-length... but no matter. Right, I remember now: the HTTP 1.0 specification. (Honestly not trying to sound sarcastic!) See: http://www.w3.org/Protocols/HTTP/1.0/draft-ietf-http-spec.html#Entity-Body However, after testing every browser on my system (from Links and ELinks, through Firefox, Chrome, Safari, Konqueror, and Dillo) across the following test code, I find that they all handle a missing content-length in the same way: reading the socket until it closes. http://pastie.textmate.org/1435415 - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 Goals
On 2011-01-06 21:26:32 -0800, James Y Knight said: You've misread that section. In HTTP/1.0, *requests* were required to have a Content-Length if they had a body (HTTP 1.1 fixed that with chunked request support). Responses have never had that restriction: they have always (even since before HTTP 1.0) been allowed to omit Content-Length and terminate by closing the socket. Ah ha, that explains my confusion, then! Thank you. - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)
On 2011-01-06 22:00:17 -0800, Graham Dumpleton said: -environ = {k: wsgi_string(v) for k,v in os.environ.items()} +environ = {k: wsgi_string(v) for k,v in list(os.environ.items())} 2to3 takes the conservative route of assuming your application treats dict.items() as a list in all cases; this is not nessicarily true (of course), but it is safe, and interestingly, backwards compatible. -raise exc_info[0], exc_info[1], exc_info[2] +raise exc_info[0](exc_info[1]).with_traceback(exc_info[2]) The exception raising syntax has changed; you can not re-raise an exception using tuple notation any more. The new syntax is far clearer, but I'm unsure of back-compatibility or even if it is possible to emulate it completely as a polygot (2.x and 3.x w/ same code). - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Python 3 / PEP 3333 (was: PEP 444 / WSGI 2 Async)
On 2011-01-06 23:40:53 -0800, Graham Dumpleton said: There is also uWSGI and CherryPy WSGI server. I recollect that Benoit may have started looking it over for gunicorn. Ah, right, I recall seeing CherryPy mentioned in archived discussions. So there's hope, then, for relatively quick adoption once ratified. :) - Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI 2 Async
[Apologies if this is a double- or triple-post; I seem to be having a stupid number of connectivity problems today.] Howdy! Apologies for the delay in responding, it’s been a hectic start to the new year. :) On 2011-01-03, at 6:22 AM, Timothy Farrell wrote: You don't know me but I'm the author of the Rocket Web Server (http://pypi.python.org/pypi/rocket) and have, in the past, been involved in the web2py community. Like you, I'm interested in seeing web development come to Python3. I'm glad you're taking up WSGI2. I have a feature-request for it that perhaps we could work in. Of course; in fact, I hope you don’t mind that I’ve re-posted this response to the web-sig mailing list. Async needs significantly broader discussion. I would appreciate it if you could reply to the mailing list thread. I would like to see futures added as a server option. This way, controllers could dispatch emails (or run some other blocking or long-running task) that would not block the web-response. WSGI2 Servers could provide a futures executor as environ['wsgi.executor'] that the app could use to offload processes that need not complete before the web-request is served to the client. E-mail dispatch is one of the things I solved a long time ago with TurboMail; it uses a dedicated thread pool and can deliver 100 unique messages per second (more if you use BCC) in the default configuration, so I don’t really see that one use case as one that can benefit from the futures module. Updating TurboMail to use futures would be an interesting exercise. ;) I was thinking of exposing the executor as environ[‘wsgi.async.executor’], with ‘wsgi.async’ being a boolean value indicating support. What should the server do with the future instances? The executor returns future instances when running executor.submit/map; the application never generates its own Future instances. The application may, however, use whatever executor it sees fit; it can, for example, have one thread pool executor and one process pool, used for different tasks. The server itself can utilize any combination of single-threaded IO-based async (see further on in this message), and multi-threaded or multi-process management of WSGI requests. Resuming suspended applications (ones pending future results) is an implementation detail of the server. Should future.add_done_callback() be allowed? I'm not sure how practical/reliable this would be. (By the time the callback is called, the calling environment could be gone. Is this undefined behavior?) If you wrap your callback in a partial(my_callback, environ) the environ will survive the end of the request/response cycle (due to the incremented reference count), and should be allowed to enable intelligent behaviour in the callbacks. (Obviously the callbacks will not be able to deliver a response to the client at the time they are called; the body iterator can, however, wait for the future instance to complete and/or timeout.) A little bit later in this message I describe a better solution than the application registering its own callbacks. Do we need to also specify what type of executor is provided (threaded vs. separate process)? I think that’s an application-specific configuration issue, not really the concern of the PEP. Do you have any thoughts about this? I believe that intelligent servers need some way to ‘pause’ a WSGI worker rather than relying on the worker executing in a thread and blocking while waiting for the return value of a future. Using generator syntax (yield) with the following rules is my initial idea: * The application may yield None. This is a polite way to have the async reactor (in the WSGI server/gateway) reschedule the worker for the next reactor cycle. Useful as a hint that “I’m about do do something that may take a moment”, allowing other workers to get a chance to perform work. (Cooperative multi-tasking on single-threaded async servers.) * The application must yield one 3-tuple WSGI response, and must not yield additional data afterwords. This is usually the last thing the WSGI application would do, with possible cleanup code afterwords (before falling off the bottom / raising StopIteration / returning None). * The application may yield Future instances returned by environ[‘wsgi.executor’].submit/map; the worker will then be paused pending execution of the future; the return value of the future will be returned from the yield statement. Exceptions raised by the future will be re-raised from the yield statement and can thus be captured in a natural way. E.g.: try: complex_value = yield environ[‘wsgi.executor’].submit(long_running) except: pass # handle exceptions generated from within long_running Similar rules apply to the response body iterator: it yields bytestrings, may yield unicode strings where native strings are unicode strings, and
Re: [Web-SIG] PEP 444 / WSGI 2 Async
Alex Grönholm and I have been discussing async implementation details (and other areas of PEP 444) for some time on IRC. Below is the cleaned up log transcriptions with additional notes where needed. Note: The logs are in mixed chronological order — discussion of one topic is chronological, potentially spread across days, but separate topics may jump around a bit in time. Because of this I have eliminated the timestamps as they add nothing to the discussion. Dialogue in square brackets indicates text added after-the-fact for clarity. Topics are separated by three hyphens. Backslashes indicate joined lines. This should give a fairly comprehensive explanation of the rationale behind some decisions in the rewrite; a version of these conversations (in narrative style vs. discussion) will be added to the rewrite Real Soon Now™ under the Rationale section. — Alice. --- General agronholm: my greatest fear is that a standard is adopted that does not solve existing problems GothAlice: [Are] there any guarantees as to which thread / process a callback [from the future instance] will be executed in? --- 444 vs. agronholm: what new features does pep 444 propose to add to pep ? \ async, filters, no buffering? GothAlice: Async, filters, no server-level buffering, native string usage, the definition of byte string as the format returned by socket read (which, on Java, is unicode!), and the allowance for returned data to be Latin1 Unicode. \ All of this together will allow a '''def hello(environ): return 200 OK, [], [Hello world!]''' example application to work across Python versions without modification (or use of b prefix) agronholm: why the special casing for latin1 btw? is that an http thing? GothAlice: Latin1 = \u → \u00FF — it's one of the only formats that can be decoded while preserving raw bytes, and if another encoding is needed, transcode safely. \ Effectively requiring Latin1 for unicode output ensures single byte conformance on the data. \ If an application needs to return UTF-8, for example, it can return an encoded UTF-8 bytestream, which will be passed right through, --- Filters agronholm: regarding middleware, you did have a point there -- exception handling would be pretty difficult with ingress/egress filters GothAlice: Yup. It's pretty much a do or die scenario in filter-land. agronholm: but if we're not ditching middleware, I wonder about the overall benefits of filtering \ it surely complicates the scenario so it'd better be worth it \ I don't so much agree with your reasoning that [middleware] complicates debugging \ I don't see any obvious performance improvements either (over middleware) GothAlice: Simplified debugging of your application w/ reduced stack to sort through, reduced nested stack overhead (memory allocation improvement), clearer separation of tasks (egress compression is a good example). This follows several of the Zen of Python guidelines: \ Simple is better than complex. \ Flat is better than nested. \ There should be one-- and preferably only one --obvious way to do it. \ If the implementation is hard to explain, it's a bad idea. \ If the implementation is easy to explain, it may be a good idea. agronholm: I would think that whatever memory the stack elements consume is peanuts compared to the rest of the application \ ingress/egress isn't exactly simpler than middleware GothAlice: The implementation for ingress/egress filters is two lines each: a for loop and a call to the elements iterated over. Can't get much simpler or easier to explain. ;) \ Middleware is pretty complex… \ The majority of ingress filters won't have to examine wsgi.input, and supporting async on egress would be relatively easy for the filters (pass-through non-bytes data in body_iter). \ If you look at a system that offers input filtering, output filtering, and decorators (middleware), modifying input should obviously be an input filter, and vice-versa. agronholm: how does a server invoke the ingress filters \ in my opinion, both ingress and egress filters should essentially be pipes \ compression filters are a good example of this \ once a block of request data (body) comes through from the client, it should be sent through the filter chain agronholm: consider an application that receives a huge gzip encoded upload \ the decompression filter decompresses as much as it can using the incoming data \ the application only gets the next block once the decompression filter has enough raw data to decompress GothAlice: Ingress decompression, for example, would accept the environ argument, detect gzip content-encoding, then decompress the wsgi.input into its own buffer, and finally replace wsgi.input in the environ with its decompressed version. \ Alternatively, it could decompress chunks and have a more intelligent replacement for wsgi.input (to delay decompression until it is needed).
[Web-SIG] PEP 444 Draft Rewrite
Howdy! I've mostly finished a draft rewrite of PEP 444 (WSGI 2), incorporating some additional ideas covering things like py2k/py3k interoperability and switching from a more narrative style to a substantially RFC-inspired language. http://bit.ly/e7rtI6 I'm using Textile as my intermediary format, and will obviously need to convert this to ReStructuredText when I'm done. Missing are: * The majority of the examples. * Narrative rationale, wich I'll be writing shortly. * Narrative Python compatibility documentation. * Asynchronous documentation. This will likely rely on the abstract API defined in PEP 3148 (futures) as implemented in Python 3.2 and the futures package available on PyPi. * Additional and complete references. The Rationale chapter will add many references to community discussion. I would appreciate it greatly if this rewrite could be read through and questions, corrections, or even references to possible ambiguity mentioned in discussion. Have a happy holidays and a merry new-year, everybody! :) - Alice. P.s. I'll be updating my PEP 444 reference implementation HTTP 1.1 server (marrow.server.http) over the holidays to incorporate the changes in this rewrite; most notably the separation of byte strings, unicode strings, and native strings. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.
That looks amazingly like the code for CherryPy Filters circa 2005. In version 2 of CherryPy, Filters were the canonical extension method (for the framework, not WSGI, but the same lessons apply). It was still expensive in terms of stack allocation overhead, because you had to call () each filter to see if it was on. It would be much better to find a way to write something like: for f in ingress_filters: if f.on: f(environ) .on will need to be an @property in most cases, still not avoiding stack allocation and, in fact, doubling the overhead per filter. Statically disabled filters should not be added to the filter list. It was also fiendishly difficult to get executed in the right order: if you had a filter that was both ingress and egress, the natural tendency for core developers and users alike was to append each to each list, but this is almost never the correct order. If something is both an ingress and egress filter, it should be implemented as middleware instead. Nothing can prevent developers from doing bad things if they really try. Appending to ingress and prepending to egress would be the right thing to simulate middleware behaviour with filters, but again, don't do that. ;) But even if you solve the issue of static composition, there's still a demand for programmatic composition (if X then add Y after it), and even decomposition (find the caching filter my framework added automatically and turn it off), and list.insert()/remove() isn't stellar at that. I have plans (and partial implementation) of a init.d-style needs/uses/provides declaration and automatic dependency graphing. WebCore, for example, adds the declarations to existing middleware layers to sort the middleware. Calling the filter to ask it whether it is on also leads filter developers down the wrong path; you really don't want to have Filter A trying to figure out if some other, conflicting Filter B has already run (or will run soon) that demands Filter A return without executing anything. You really, really want the set of filters to be both statically defined and statically analyzable. Unfortunately, most, if not all filters need to check for request headers and response headers to determine the capability to run. E.g. compression checks environ.get('HTTP_ACCEPT_ENCODING', '').lower() for 'gzip', and checks the response to determine if a 'Content-Encoding' header has already been specified. Finally, you want the execution of filters to be configurable per URI and also configurable per controller. So the above should be rewritten again to something like: for f in ingress_filters(controller): if f.on(environ['path_info']): f(environ) It was for these reasons that CherryPy 3 ditched its version 2 filters and replaced them with hooks and tools in version 3. This is possible by wrapping multiple applications, say, in the filter middleware adapter with differing filter setups, then using the separate wrapped applications with some form of dispatch. You could also utilize filters as decorators. This is an implementation detail left up to the framework utilizing WSGI2, however. WSGI2 itself has no concept of controllers. None of this prevents the simplified stack from being useful during exception handling, though. ;) What I was really trying to do is reduce the level of nesting on each request and make what used to be middleware more explicit in its purpose. You might find more insight by studying the latest cherrypy/_cptools.py I'll give it a gander, though I firmly believe filter management (as middleware stack management) is the domain of a framework on top of WSGI2, not the domain of the protocol. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.
Howdy! There's one issue I've seen repeated a lot in working with WSGI1 and that is the use of middleware to process incoming data, but not outgoing, and vice-versa; middleware which filters the output in some way, but cares not about the input. Wrapping middleware around an application is simple and effective, but costly in terms of stack allocation overhead; it also makes debugging a bit more of a nightmare as the stack trace can be quite deep. My updated draft PEP 444[1] includes a section describing Filters, both ingress (input filtering) and egress (output filtering). The API is trivially simple, optional (as filters can be easily adapted as middleware if the host server doesn't support filters) and easy to implement in a server. (The Marrow HTTP/1.1 server implements them as two for loops.) Basically an input filter accepts the environment dictionary and can mutate it. Ingress filters take a single positional argument that is the environ. The return value is ignored. (This is questionable; it may sometimes be good to have ingress filters return responses. Not sure about that, though.) An egress filter accepts the status, headers, body tuple from the applciation and returns a status, headers, and body tuple of its own which then replaces the response. An example implementation is: for filter_ in ingress_filters: filter_(environ) response = application(environ) for filter_ in egress_filters: response = filter_(*response) I'd love to get some input on this. Questions, comments, criticisms, or better ideas are welcome! — Alice. [1] https://github.com/GothAlice/wsgi2/blob/master/pep-0444.rst ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444
I’ve updated my copy of the PEP, re-naming non-commentary and non-revision text to reference WSGI2, wsgi2, or wsgi (environment variables) as appropriate. I’ve also added the first draft of the text describing filters and some sample code, including a middleware adapter for filters. Here are some additional notes: https://gist.github.com/719763 — filter vs. middleware http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html It might be worth another PEP to describe interfaces to common data to encourage interoperability between filters/middleware, such as GET/POST data, cookies, session data (likely using Beaker’s API as a base), etc. Also something I’ve been exploring is automatic resolution of middleware/filter dependance by utilizing “uses”, “needs”, and “provides” properties on the callables and a middleware stack factory which can graph the dependancy tree. On a side note, I do not appear to be receiving posts to this mailing list, only the out-of-list CC/BCCs. :/ And here I’ve been getting used to reading and posting to comp.lang.python[.announce] on Usenet. ;) — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444
Would you prefer to give me collaboration permissions on your repo, or should I fork it? This message was sent from a mobile device. Please excuse any terseness and spelling or grammatical errors. If additional information is indicated it will be sent from a desktop computer as soon as possible. Thank you. On 2010-11-21, at 11:40 PM, Chris McDonough chr...@plope.com wrote: Georg Brandl has thus far been updating the canonical PEP on python.org. I don't know how you get access to that. My working copy is at https://github.com/mcdonc/web3 . ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444
I’ve forked it, now available at: https://github.com/GothAlice/wsgi2 Re-naming it to wsgi2 will be my first order of business during the week, altering your association the second. I’ll post change descriptions for discussion as I go. — Alice. On 2010-11-22, at 12:12 AM, Chris McDonough wrote: Would you prefer to give me collaboration permissions on your repo, or should I fork it? Please fork it or create another repository entirely. I have no plans to do more work on it personally, so I don't think it should really be associated with me. To that end, I think I'd prefer my name to either be off the PEP entirely or just listed as a helper or typist or something. ;-) ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] PEP 444
On 2010-11-22, at 3:05 PM, Mark Ramm mark.mchristen...@gmail.com wrote: I would very much prefer it if we could keep the current name or choose a new unrelated name, not wsgi2 as I think there API changes warrant a new name to prevent confusion. Web3, as mentioned in previous mailing list traffic, is a registered trademark. Python Web and WSGI are closely linked in the public mind-space. (Sleep deprived an can't think of a better way to phrase that.) Finally, I, and seemingly Python core, interpret major version number changes as breaking; py3k having backwards-incompatible syntax changes. At a high level PEP 444 is /similar/ to WSGI in so far as the environ is a dict, and the returned values are a bytestring status, list of tuples for headers, and an iterable body. The inner implementation details seem a progressive enhancement and clarification of details which just happen to be backwards-incompatible. Preserving the WSGI name has marketing benefits, refines existing understanding of the server/middleware/application semantics rather than implying something /completely/ new, and increasing the version to 2.0 clearly declares the backwards-incompatibility. I think that Python 2 vs. 3 is a good comparison here; Python 3 has a different syntax and grammar, making it a fundamentally different language and is incompatible because of this. Why is it called Python and not Xyzzy? #python wouldn’t have to have ;) Web frameworks have been encountering this problem for some time; TurboGerars developers, e.g., have been mulling over migrating to Pyramid or another top-level metaframework and debating strategies for migration: point everyone at something else, create something new, or keep the name and associated recognition? Technically PEP 444 is incompatible, and wsgi.version = (2, 0) (and clear documentation) should indicate that. — Alice. ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
[Web-SIG] PEP 444
(A version of this is is available at http://web-core.org/2.0/pep-0444/ — links are links, code may be easier to read.) PEP 444 is quite exciting to me. So much so that I’ve been spending a few days writing a high-performance (C10K, 10Krsec) Py2.6+/3.1+ HTTP/1.1 server which implements much of the proposed standard. The server is functional (less web3.input at the time of this writing), but differs from PEP 444 in several ways. It also adds several features I feel should be part of the spec. Source for the server is available on GitHub: https://github.com/pulp/marrow.server.http I have made several notes about the PEP 444 specification during implementation of the above, and concern over some implementation details: First, async is poorly defined: If the origin server advertises that it has the web3.async capability, a Web3 application callable used by the server is permitted to return a callable that accepts no arguments. When it does so, this callable is to be called periodically by the origin server until it returns a non-None response, which must be a normal Web3 response tuple. Polling is not true async. I believe that it should be up to the server to define how async is utilized, and that the specification should be clarified on this point. (“Called periodically” is too vague.) “Callable” should likely be redefined as “generator” (a callable that yields) as most applications require holding on to state and wrapping everything in functools.partial() is somewhat ugly. Utilizing generators would improve support for existing Python async frameworks, and allow four modes of operation: yield None (no response, keep waiting), yield response_tuple (standard response), return / raise StopIteration (close the async connection) and allow for data to be passed back to the async callable by the higher-level async framework. Second, WSGI middleware, while impressive in capability, are somewhat… heavy-weight. Heavily nesting function calls is wasteful of CPU and RAM, especially if the middleware decides it can’t operate, for example, GZip compression disabling itself for non-text/ mimetypes. The majority of WSGI middleware can, and probably should be, implemented as linear ingress or egress filters. For example, on-disk static file serving could be an ingress filter, and GZip compression an egress filter. m.s.http supports this filtering and demonstrates one API for such. Also, I am in the process of writing an example egress CompressionFilter. An example API and filter use implementation: (paraphrased from marrow.server.http) # No filters, near 0 overhead. for filter_ in ingress_filters: # Can mutate the environment. result = filter_(env) # Allow the filter to return a response rather than continuing. if result: # result is a status, headers, body_iter tuple return result[0], result[1], result[2] status, headers, body = application(env) for filter_ in egress_filters: # Can mutate the environment, status, headers, body, or # return completely new status, headers, and body. status, headers, body = filter_(env, status, headers, body) return status, headers, body The environment has some minor issues. I’ll write up my changes in RFC-style: SERVER_NAME is REQUIRED and MUST contain the DNS name of the server OR virtual server name for the web server if available OR an empty bytestring if DNS resolution is unavailable. SERVER_ADDR is REQUIRED and MUST contain the web server’s bound IP address. URL reconstruction SHOULD use HTTP_HOST if available, SERVER_NAME if there is no HTTP_HOST, and fall back on SERVER_ADDR if SERVER_NAME is an empty bytestring. CONTENTL_LENGTH is REQUIRED and MUST be None if not defined by the client. Testing explicitly for None is more efficient than armoring against missing values; also, explicit is better than implicit. (Paste’s WSGI1 server defines CONTENT_LENGTH as 0, but this implies the client explicitly declared it as zero, which is not the case.) FRAGMENT and PARAMETERS are REQUIRED and are parsed out of the URL in the same way as the QUERY_STRING. FRAGMENT is the text after a hash mark (a.k.a. “anchor” to browsers, e.g. /foo#bar). PARAMETERS come before QUERY_STRING, and after PATH_INFO separated by a semicolon, e.g. /foo;bar?baz. Both values MUST be empty bytestrings if not present in the URL. (Rarely used — I’ve only seen it in Java and ColdFusion applications — but still useful.) Points of contention: Changing the namespace seems needless. Using the wsgi.* namespace with a wsgi.version of (2, 0) will allow applications to easily armor themselves against incompatible use. That’s what wsgi.version is for! I’d add this as a strong “point of contention”. m.s.http keeps the wsgi namespace and uses a version of (2, 0). That’s it so far. I may occasionally write in with additional ideas as I continue with my HTTP server
Re: [Web-SIG] PEP 444
PEP 444 has no champion currently. Both Armin and I have basically left it behind. It would be great if you wanted to be its champion. Done. As I already have a functional, performant HTTP server[1] and example filter[2] (compression) utilizing a slightly modified version of PEP 444, and hope to be giving a presentation on its design and related utilities[3] early next year, I’d love to have the opportunity to directly shape its future. My server may be a bit large to be a reference implementation, but until it has its first user I have the benefit of being able to experiment whole-heartedly with features and proposals. Since Python 3 was released I haven’t heard of much forward-progress in getting web frameworks compatible. The largest complaint I’ve heard is that there are too few things already ported, which is a chicken and the egg problem. This is one scenario where re-inventing the wheel may be the only way to see forward movement. So far, I seem to be buckling down and Getting Things Done™ in this regard. How would I go about getting access to the PEP in order to fix the issues I’ve been catching up on? (I’ve been reading through quite a bit of old mailing list traffic these last few hours in-between writing docs and unit tests for the compression egress filter.) Now I’m even more excited. I’ll make a separate post to confirm and get some input on the issues I’ve encountered thus far. — Alice. [1] https://github.com/pulp/marrow.server.http [2] https://github.com/pulp/marrow.wsgi.egress.compression — full documentation included [3] http://web-core.org/marrow/confoo/ — input welcome; the deadline for modification is the 26th ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com