Re: [Web-SIG] ANN: General availability of the WebCore WSGI nanoframework v2.0.

2016-04-19 Thread Alice BevanMcGregor

On 2016-04-19 12:56:26 +, Ian Cordasco said:


* Annotated Source Documentation: http://s.webcore.io/fjVc
(pythonhosted docs, also linked on the pypi page)


For what it's worth, the PyPI/Warehouse/PyPA developers are planning
on deprecating pythonhosted for PyPI packages. The suggestion is that
you use something like ReadTheDocs.org for documentation hosting.


Indeed, I tried desperately to not use it, but WebCore 1 documentation 
was formerly there and after two hours of searching prior to release 
was unable to find any way to _remove_ the already uploaded 
documentation and remove the reference on the pypi page.


A very sub-optimal packaging experience, there, so the sooner it's 
actually gone the happier I'll be.  (It's quite slow, for example. ;)


-- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] OT: dotted names

2011-07-06 Thread Alice BevanMcGregor

On 2011-04-15 22:33:08 +, P.J. Eby said:


At 04:11 PM 4/15/2011 -0400, Fred Drake wrote:

These end users don't really care if the object identified is a class or
function in module, a nested attribute on a class, or anything else, so
long as it does what it's advertised to do.  By not pushing implementation
details into the identifier, the package maintainer is free to change the
implementation in more ways, without creating backward incompatibility.


That would be one advantage of using entry points
instead.  ;-)  (i.e., the user doesn't specify the object location,
the package author does.)

Note, however, that one must perform considerably more work to
resolve a name, when you don't know whether each part of the name is
a module or an attribute.


Not if, as you mention, you use an explicit format.  The format my 
resolver code uses (and this code is utilized in marrow.mailer for 
manager/transport lookup, marrow.server.http's command-line script to 
resolve WSGI applications, and marrow.templating to resolve templates) 
covers the following:


:: object
:: entrypoint_name
:: ../relative/path/to/something
:: ./relative/path/to/something
:: /absolute/path/to/something
:: package.relative/path/to/something
:: package.absolute.path
:: package.submodule:object
:: package.submodule:object.attribute

What is allowed on any given resolution depends on if the resolver 
request is looking for an on-disk path or object.


Using the above as an example, you can define the use of the SMTP 
transport within marrow.mailer in two ways:


from marrow.mailer.transport.smtp import SMTPTransport
config = dict(transport=SMTPTransport) # direct reference
config = dict(transport=smtp) # entry point
config = dict( # object lookup
   transport = marrow.mailer.transport.smtp:SMTPTransport
 )

When configuring m.s.http to load an app, you can:

# p-code
HTTPServer.serve(project.application:WSGIApp.factory)

When choosing templates, OTOH, you can do the following:

return ./templates/foo.html, dict()
return /var/www/foo.html, dict()
return myapp.templates.foo, dict()
return myapp/templates/foo.html, dict()
return myapp.stemplates:email.welcome, dict()

Either you have to get an AttributeError first, and then fall back to 
importing, or get an ImportError first, and fall back to getattr.


If you examine the above closely, the differing formats are easily 
identifiable using a few == and 'in' conditionals:


if not isinstance(ref, basestring):
   return ref

if ref[0] == '.': pass # relative
if ref[0] == '/': pass # absolute
if '/' not in ref and '.' not in ref and ':' not in ref:
   pass # entrypoint
if ':' in ref:
   import_, _, attrs = ref.partition(':')
   base = __import__(import_)
   for attr in attrs.split('.'):
   base = getattr(base, attr)
   return attr
if '/' in ref:
   import_, _, path = ref.partition('/')
   pass # use pkg_resources + path to pull file from package

If the syntax is explicit, OTOH, then you don't have to guess, thereby 
saving lots of work and wasteful exceptions.


:)

— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-18 Thread Alice BevanMcGregor

On 2011-04-18 14:11:21 -0700, Daniel Holth said:


The file format discussion seems utterly pointless.


That's a pity.

If you want the format to specify cron jobs and services and non-wsgi 
servers, why not go the whole way and use the Linux filesystem 
hierarchy standard. The entry point is an executable called `init`, 
configuration goes in /etc/, cron jobs go in /etc/cron.d etc. This 
should be flexible enough.


Because that would be… less than good.  Let me illustrate:

a) The LFS is intended for complete operating system installations.

b) You sure as hell wouldn't want the init process to be Python.

c) Operating-system specific features are a no-go for portability.

d) We don't want developers to have to suddenly become sysadmins, too.

e) /etc is terrible for configuration organization.

There are other, lower-level reasons not to do that.

One big point is that the application server / container writes a 
single configuration file which is then read in by the application.  
One file, not a tree of them.


I hope most applications won't need to look at the contents of app.yaml 
(the application container config) at all.


No-one has said that an application /would/ have to look at the 
application metadata, or that after installation the file was anywhere 
app accessible, even.


Paste Deploy configures logging by passing the .ini to logging before 
invoking the app's entry point. This is the application container 
configuring the logging.


I've already defined that.  RTFM or many ML messages about logging.

For example a cool application container feature would be to have a 
little web application that manipulated logging configuration in a 
database, or reconfigured logging between requests without restarting 
the application.


The former is already defined.  That's what the application server 
does, database or no.  The latter is broadly unnecessary, but easily 
implementable within the application you are deploying.


One way to pass 'services' information would be to specify a support 
package with abstract base classes and have a procedure for proposing 
new standard services to the web-sig. The container would have to 
populate a registry of named implementations of those services it is 
able to support:


That seems… excessive and ugly.  You would also have code mixing 
between the application server level and application level which will 
encourage nothing but madness.  Simple, named services with optional 
configurations are more than enough.


I would really like to see a basic specification with no support for 
services or 'spending an hour running apt-get to reconfigure the server 
before eventually getting around to running the application', and a 
procedure for extending the format.


apt-get has already been thrown out, and was, in fact, never part of 
the quick summary I made, either.


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-18 Thread Alice BevanMcGregor

On 2011-04-18 16:36:28 -0700, Daniel Holth said:

On Apr 18, 2011, at 6:09 PM, Alice Bevan–McGregor wrote:

I've already defined that.  RTFM or many ML messages about logging.


Please remain friendly and patient. 


That depends on how you define the F in RTFM.  In this instance, I 
meant read the fine manual.  ;)


You can understand my frustration, however, that  10% of the posts in 
this thread demonstrate a lack of understanding of (or lack of even a 
cursory glance at) a) my initial post and associated document, and b) 
the rest of the mailing list posts.


Asking for things already agreed upon or questions already resolved 
wastes everyone's time.


On 2011-04-18 16:46:12 -0700, Eric Larson said:
Instead of assuming /etc always means the root of the filesystem we 
should consider it the root of the sandbox where the system providing 
the sandbox defines what that is.


While /etc certainly wouldn't be the root of anything (insert sarcastic 
smiley here ;), it was already agreed upon that / would refer to the 
application container root, not system root.  I share Ian's sentiment, 
see: (search for 'root' on that page)


http://mail.python.org/pipermail/web-sig/2011-April/005041.html

It is _a_ filesystem in that there is a place that an application will 
be run. For argument's sake, we'll say it is a directory on some 
server. Now, within that directory we choose to take some known bits 
from the LFS standard such as /etc, /bin, /var, etc for the placement 
of our application.


Again, not such a great idea.

With that in mind, I think using things like LFS makes a ton of sense. 
We can piggy back or copy (since previous discussions for .debs or rpms 
seem not to sit well... even though they would fit this model very 
well...) systems like RPM rather directly and hopefully allow our 
Python web apps to play very nicely with applications in other 
languages.


I can't fully grok this paragraph.  FHS (my bad calling it LFS 
earlier!) = good because we won't confuse systems administrators and it 
matches other binary packaging models?


I doubt an isolated web application will have a need for more than 6% 
(3) of these:


http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard#Directory_structure

While I personally have a FHS-like application deployment model using 
Git, I would rather not see that level of complexity as a requirement 
for deploying basic applications.


Please do not get hung up on the fact that I've said RPMs here. The 
fact is distros have been doing package management for quite a long 
while. It is insanely convenient to say apt-get install couchdb and 
when it is done, having a couchdb server running.


It may be convienent, but it's also quite the risk.  You're letting 
someone else configure your server.  Also, do binary installation 
systems automatically start the service post-installation before you 
can configure them?  I have difficulty believing that, which means a 
whole whack-ton of effort under a systems administrator hat has been 
glossed over.


Copying the model seems like a good option in that we get to learn from 
the mistakes of others while inheriting a wild variety of tools and 
concepts.


The on-disk structure which the application lives within (the 
application container) is up to the application server in use.  The 
underlying application should, and, IMHO, -must- be agnostic to it.  
Passing paths to configuration files, TMPDIR, etc. in the environment 
is a fairly trivial way to do that, at which point the FHS discussion 
is nearly moot.


If you want a complete (complete enough for a simple web application) 
FHS structure within the redistributable, I don't see the point of 
having that many empty directories.  ;)


As an aside, I -do- have an application in production using a FHS-like 
file structure:


https://gist.github.com/926617

But again, I'm not suggesting something like that for the 
redistributable application!


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] OT: dotted names (Was: Re: A Python Web Application Package and Format)

2011-04-15 Thread Alice BevanMcGregor

On 2011-04-15 11:02:17 -0700, Jim Fulton said:

On Fri, Apr 15, 2011 at 1:32 PM, Éric Araujo 
mer...@netwok.org wrote:
As an aside, I wonder why people use dot+colon notation instead of just 
dots to reference callables.  In distutils2 for example we resolve 
dotted names to find command classes, command hooks and compilers.  So 
what’s the benefit, marginally easier parsing?


An opportunity of using a colon is that it allows::

   dotted.module.name:expression

where expression may be more than just a name::

  foo.bar:Bar()


Or foo.bar:Baz.factory.

I wouldn't go so far as to eval() what's after the colon.  The real 
difference is this:


[foo.bar]:[Baz.factory]
| ^- Attribute lookup.
^- Module lookup.

You can't do this:

import foo.bar.Baz.factory

Thus the difference.  However, the syntax is actually more flexible than that:

[foo.bar]/[subfolder/file]
| ^- Sub-path.
^- Module.

/[foo/bar]
 ^- Just path.

— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-15 Thread Alice BevanMcGregor

On 2011-04-14 10:34:59 -0700, Ian Bicking said:

I think there's a general concept we should have, which I'll call a 
script -- but basically it's a script to run (__main__-style), a 
callable to call (module:name), or a URL to fetch internally.


Agreed.  The reference notation I mentioned in my reply to Graham, with 
the addition of URI syntax, covers all of those options.


I want to keep this distinct from anything long-running, which is a 
much more complex deal.


The primary application is only potentially long-running.  (You could, 
in theory, deploy an app as CGI, but that way lies madness.)  However, 
the reference syntax mentioned (excepting URL) works well for 
identifying this.


I think given the three options, and for general simplicity, the script 
can be successful or have an error (for Python code: exception or no; 
for __main__: zero exit code or no; for a URL: 2xx code or no), and can 
return some text (which may only be informational, not structured?)


For the simple cases (script / callable), it's pretty easy to trap 
STDOUT and STDERR, deliver INFO log messages to STDOUT, everything else 
to STDERR, then display that to the administrator in some form.  Same 
for HTTP, except that it can include full HTML formatting information.


An application configuration could refer to scripts under different 
names, to be invoked at different stages.


A la the already mentioned post-install, pre-upgrade, post-upgrade, 
pre-removal, and cron-like.  Any others?


There could be an optional self-test script, where the application 
could do a last self-check -- import whatever it wanted, check db 
settings, etc.  Of course we'd want to know what it needed *before* the 
self-check to try to provide it, but double-checking is of course good 
too.


Unit and functional tests are the most obvious.  In which case we'll 
need to be able to provide a localhost-only 'mounted' location for the 
application even though it hasn't been installed yet.


One advantage to a separate script instead of just one 
script-on-install is that you can more easily indicate *why* the 
installation failed.  For instance, script-on-install might fail 
because it can't create the database tables it needs, which is a 
different kind of error than a library not being installed, or being 
fundamentally incompatible with the container it is in.  In some sense 
maybe that's because we aren't proposing a rich error system -- but 
realistically a lot of these errors will be TypeError, ImportError, 
etc., and trying to normalize those errors to some richer meaning is 
unlikely to be done effectively (especially since error cases are hard 
to test, since they are the things you weren't expecting).


Humans are potentially better at reading tracebacks than machines are, 
so my previous logging idea (script output stored and displayed to the 
administrator in a readable form) combined with a modicum of reasonable 
exception handling within the script should lead to fairly clear errors.



Categorizing services seems unnecessary.


The description of the different database options were for 
illustration, not actual separation and categorization.


I'd like to see maybe an | operator, and a distinction between required 
and optional services.  E.g.:


No need for some new operator, YAML already supports lists.

services:
- [mysql, postgresql, dburl]

Or:

services:
required:
- files

optional:
- [mysql, postgresql]

And then there's a lot more you could do... which one do you prefer, 
for instance.


The order of services within one of these lists would indicate 
preference, thus MySQL is preferred over PostgreSQL in the second 
example, above.



Tricky things:
- You need something funny like multiple databases.  This is very 
service-specific anyway, and there might sometimes need to be a way to 
configure the service.  It's also a fairly obscure need.


I'm not convinced that connecting to a legacy database /and/ current 
database is that obscure.  It's also not as hard as Django makes it 
look (with a 1M SLoC change to add support)… WebCore added support in 
three lines.


- You need multiple applications to share data.  This is hard, not sure 
how to handle it.  Maybe punt for now.


That's what higher-level APIs are for. ;)

You mean, the application provides its own HTTP server?  I certainly 
wouldn't expect that...?


Nor would I; running an HTTP server would be daft.  Running mod_wsgi, 
FastCGI on-disk sockets, or other persistent connector makes far more 
sense, and is what I plan.


Unless you have a very, very specific need (i.e. Tornado), running a 
Python HTTP server in production then HTTP proxying to it is 
inefficient and a terrible idea.  (Easy deployment model, terrible 
overhead/performance.)


Anyway, in terms of aggregate, I mean something like a site that is 
made up of many applications, and maybe those applications are 
interdependent in some fashion.  

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-14 Thread Alice BevanMcGregor

On 2011-04-13 18:16:36 -0700, Ian Bicking said:

While initially reluctant to use zip files, after further discussion 
and thought they seem fine to me, so long as any tool that takes a zip 
file can also take a directory.  The reverse might not be true -- for 
instance, I'd like a way to install or update a library for (and 
inside) an application, but I doubt I would make pip rewrite zip files 
to do this ;)  But it could certainly work on directories.  Supporting 
both isn't a big deal except that you can't do symlinks in a zip file.


I'm not talking about using zip files as per eggs, where the code is 
maintained within the zip file during execution.  It is merely a 
packaging format with the software itself extracted from the zip during 
installation / upgrade.  A transitory container format.  (Folders in 
the end.)


Symlinks are an OS-specific feature, so those are out as a core 
requirement.  ;)


I don't think we're talking about something like a buildout recipe. 
 Well, Eric kind of brought something like that up... but otherwise I 
think the consensus is in that direction.


Ambiguous statements FTW, but I think I know what you meant.  ;)

So specifically if you need something like lxml the application 
specifies that somehow, but doesn't specify *how* that library is 
acquired.  There is some disagreement on whether this is generally 
true, or only true for libraries that are not portable.  


+1

I think something along the lines of autoconf (those lovely ./configure 
scripts you run when building GNU-style software from source) with 
published base 'checkers' (predicates as I referred to them previously) 
would be great.  A clear way for an application to declare a 
dependency, have the application server check those dependencies, then 
notify the administrator installing the package.


I've seen several Python libraries that include the C library code that 
they expose; while not so terribly efficient (i.e. you can't install 
the C library once, then share it amongst venvs), it is effective for 
small packages.


Larger (i.e. global or application-local) would require the 
intervention of a systems administrator.


Something like a database takes this a bit further.  We haven't really 
discussed it, but I think this is where it gets interesting.  Silver 
Lining has one model for this.  The general rule in Silver Lining is 
that you can't have anything with persistence without asking for it as 
a service, including an area to write files (except temporary files?)


+1

Databases are slightly more difficult; an application could ask for:

:: (Very Generic) A PEP-249 database connection.

:: (Generic) A relational database connection string.

:: (Specific) A connection string to a specific vendor of database.

:: (Odd) A NoSQL database connection string.

I've been making heavy use of MongoDB over the last year and a half, 
but AFIK each NoSQL database engine does its own thing API-wise.  (Then 
there are ORMs on top of that, but passing a connection string like 
mysql://user:pass@host/db or mongo://host/db is pretty universal.)


It is my intention to write an application server that is capable of 
creating and securing databases on-the-fly.  This would require fairly 
high-level privileges in the database engine, but would result in far 
more plug-and-play configuration.  Obviously when deleting an 
application you will have the opportunity to delete the database and 
associated user.


I assume everyone agrees that an application can't write to its own 
files (but of course it could execfile something in another location).


+1; that _almost_ goes without saying.  :)  At the same time, an 
application server /must not/ require root access to do its work, thus 
no mandating of (real) chroots, on-the-fly user creation, etc.


There are ways around almost all security policies, but where possible 
setting the read-only flag (Windows) or removing write (chmod -w on 
POSIX systems) should be enough to prevent casual abuse.


I suspect there's some disagreement about how the Python environment 
gets setup, specifically sys.path and any other application-specific 
customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in 
silvercustomize.py, and find it helpful).


Similar to Paste's here variable for INI files, having some method of 
the application defining environment variables with base path 
references would be needed.


I've tossed out my idea of sharing dependencies, BTW, so a simple 
extraction of the zipped application into one package folder (linked in 
using a .pth file) with the dependencies installed into an app-packages 
folder in the path (like site-packages) would be ideal.  At least, for 
me.  ;)


Describing the scope of this, it seems kind of boring.  In, for 
example, App Engine you do all your setup in your runner -- I find this 
deeply annoying because it makes the runner the only entry point, and 
thus makes testing, scripts, etc. hard.


I agree; that's a 

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-14 Thread Alice BevanMcGregor

Howdy!

I suspect you're thinking a little too low-level.

On 2011-04-14 00:53:09 -0700, Graham Dumpleton said:

On 14 April 2011 16:57, Alice Bevan–McGregor 
al...@gothcandy.com wrote:

3. Define how to get the WSGI app.  This is WSGI specific, but (1) is
*not* WSGI specific (it's only Python specific, and would apply well to
other platforms)


I could imagine there would be multiple application types:

:: WSGI application.  Define a package dot-notation entry point to a 
WSGI application factory.


Why can't it be a path to a WSGI script file?


No reason it couldn't be.

app.type = wsgi
app.target = /myapp.wsgi:application

(Paths relative to the folder the application is installed into, and 
dots after a slash are filename parts, not module separators.)


But then, how do you configure it?  Using a factory (which is passed 
the from-appserver configuration) makes a lot of sense.


This actually works more universally as it works for servers which map 
URLs to file based

resources as well.


First, .wsgi files (after a few quick Google searches) are only used by 
mod_wsgi.  I wouldn't call that universal, unless you can point out 
the other major web servers that support that format.


You'll have to describe the map URLs to file based resources issue, 
since every web server I've ever encountered (Apache, Nginx, Lighttpd, 
etc.) works that way.  Only if someone is willing to get really hokey 
with the system described thus far would any application-scope web 
servers be running.


Also allows alternate extensions than .py and also allows basename of 
file name to be arbitrarily named, both of which help with those same 
servers which map URLs to file base resources.


Again, you'll have to elaborate or at least point to some existing 
documentation on this.


I've never encountered a problem with that, nor do any of my scripts 
end in .py.


It also allows same name WSGI script file to exist in multiple 
locations managed by same server without having to create an 
overarching package structure with __init__.py files everywhere.


Packages aren't a bad thing.  In fact, as described so far, a top level 
package is required.



For WSGI servers which currently require a dotted path, eg gunicorn:


See my note above; choice of Python-level HTTP interface is not up to 
the application, though by all means there should be some simple way to 
launch a development server.


The WSGI script file then can itself even be responsible for further 
setup of sys.path as appropriate and so be more self contained and not 
dependent on an external launch system.


The -point- (AFIK/IMHO) is to be dependent on an external launch system.


and in the end of myapp.py add bolier plate like:

  from wsgiref.simple_server import make_server

  httpd = make_server('', 8000, application)
  print Serving on port 8000...
  httpd.serve_forever()


Again, I've never described anything that would require that nonsense.  
WSGI callable, preferably a factory callable, that's it.


Use a different server which required such boilerplate and you had to 
change it.


Not the problem of the application.

Using a WSGI script file as the lowest common denominator, it would 
also be nice to be able to do something like:


  python -m gunicorn.server myapp.wsgi
  python -m wsgiref.server myapp.wsgi


Not a half bad idea, but again, no reason to restrict it to .wsgi 
files.  (That's also a completely different problem then an 
applicaiton format currently under discussion.)


I've written and rewritten my dot-colon-notation system enough that it 
supports:


:: /path[/sub[...]][:object[.property]] (even if it has to execfile it)
:: package[.module[...]][/folder[...]][:object[.property]]

I think that syntax pretty much covers everything, including .wsgi 
files (/path/to/foo.wsgi:application).  The implementation of the above 
is fully unit tested, and I really don't mind people stealing it.  ;)


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-14 Thread Alice BevanMcGregor

On 2011-04-14 08:53:55 -0700, Randy Syring said:

Just wondering if Windows/IIS is being kept in mind as this discussion 
is going on.  I am having a hard time conceptualizing the things being 
discussed, so can't really tell myself.


I'm trying pretty hard to ensure that non-compatible OS features don't 
make it in here.  Things like symlinks, chroots, etc.


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Alice BevanMcGregor

On 2011-04-11 00:53:02 -0700, Eric Larson said:


Hi,
On Apr 10, 2011, at 10:29 PM, Alice Bevan–McGregor wrote:

However, the package format I describe in that gist does include the 
source for the dependencies as snapshotted during bundling.  If your 
application is working in development, after snapshotting it /will/ 
work on sandbox or production deployments.


I wanted to chime in on this one aspect b/c I think the concept is 
somewhat flawed. If your application is working in development and 
snapshot the dependencies that is no guarantee that things will work 
in production. The only way to say that snapshot or bundle is 
guaranteed to work is if you snapshot the entire system and make it 
available as a production system.


`pwaf bundle` bundles the source tarballs, effectively, of your 
application and dependencies into a single file.  Not unlike a certain 
feature of pip.


And… wait, am I the only one who uses built-from-snapshot virtual 
servers for sandbox and production deployment?  I can't be the only one 
who likes things to work as expected.


Using a real world example, say you develop your application on OS X 
and you deploy on Ubuntu 8.04 LTS. Right away you are dealing with two 
different operating systems with entirely different system calls. If 
you use something like lxml and simplejson, you have no choice but to 
repackage or install from source on the production server.


Installing from source is what I was suggesting.  Also, Ubuntu on a 
server?  All your `linux single` (root) are belong to me.  ;^P


While it is fair to say that generally you could avoid packages that 
don't use C, both lxml and simplejson are rather obvious choices for 
web development.


Except that json is built-in in 2.6 (admittedly with fewer features, 
but I've never needed the extras) and there are alternate xml parsers, 
too.


It sounds like Ian doesn't want to have any build steps which I think 
is a bad mantra. A build step lets you prepare things for deployment. A 
deployment package is different than a development package and mixing 
the two by forcing builds on the server or seems like asking for 
trouble.


I'm having difficulty following this statement: build steps good, 
building on server bad?  So I take it you know the exact target 
architecture and have cross-compilers installed in your development 
environment?  That's not practical (or simple) at all!


I'm not saying this is what you (Alice) are suggesting, but rather 
pointing out that as a model, depending on virtualenv + pip's bundling 
capabilities seems slightly flawed.


Virtualenv (or something utilizing a similar Python path 'chrooting' 
capability) and pip using the extracted deps as the source for 
offline installation actually seems quite reasonable to me.  The 
benefit of a known set of working packages (i.e. specific version 
numbers, tested in development) and the ability to compile C extensions 
in-place.  (Because sure as hell you can't reliably compile them 
before-hand if they have any form of system library dependency!)


I think it should offer hooks for running tests, learning basic status 
and allow simple configuration for typical sysadmin needs (logging via 
syslog, process management, nagios checks, etc.). Instead of focusing 
on what format that should take in terms of packages, it seems more 
effective to spend time defining a standard means of managing WSGI apps 
and piggyback or plain old copy some format like RPMs or dpkg.


RPMs are terrible, dpkg is terrible.  Binary package distribution, in 
general, is terrible.  I got the distinct impression at PyCon that 
binary distributable .eggs were thought of as terrible and should be 
phased out.


Also, nobody so far seems to have noticed the centralized logging 
management or deamon management lines from my notes.


Just my .02. Again, I haven't offered code, so feel free to ignore me. 
But I do hope that if there are others that suspect this model of 
putting source on the server is a problem pipe up. If I were to add a 
requirement it would be that Python web applications help system 
administrators become more effective. That means finding consistent 
ways of deploying apps that plays well with other languages / 
platforms. After all, keeping a C compiler on a public server is rarely 
a good idea.


If you could demonstrate a fool-proof way to install packages with 
system library dependencies using cross-compilation from a remote 
machine, I'm all ears.  ;)


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Alice BevanMcGregor

On 2011-04-11 13:49:20 -0700, Alex Grönholm said:


I use Ubuntu on all my servers, and linux single does not work with
it, I can tell you ;P


The number of poorly configured Ubuntu servers I have seen (and 
replaced) is staggering.  Any time the barrier to entry is lowered, 
quality suffers: having a compiler on the server is nothing compared to 
having a complete X graphical environment running as root, with root 
and a single user sharing the same password.  ;^D


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Alice BevanMcGregor

Howdy!

On 2011-04-11 15:22:11 -0700, Ian Bicking said:


I... think we are misunderstanding each other or something.


Something.  ;)

A nice tool that could use this format, for instance, would be a tool 
that takes an app and creates a puppet recipe to setup a sever to host 
the application.  A different tool (maybe better, maybe not?) would be 
a puppet plugin (if that's the terminology) that uses this format to 
tell puppet about all the requirements an application has, perhaps 
translating some notions to puppet-native concepts, or adding 
high-level recipes that setup an appropriate container (which can be as 
simple as a properly configured Nginx or Apache server).


Minuteman (loved the hat from the PyCon lightning talk), buildout, 
puppet, make, bash, custom XML-RPC APIs, … there are quite a number of 
ways to push something into production.  Standardizing on one would 
marginalize the idea, and being agnostic means there is a whole /lot/ 
of work to be done to add support to every tool.  :/


What I mean when I say there's a danger of becoming a configuration 
management tool, is that if you include hooks for the application to 
configure its environment you are probably stepping on the toes of 
whatever other tool you might use.  And once you start down that path 
things tend to cascade.


Have a gander at the Application Spec section; what, specifically, are 
you at odds with as coming from the application?  I work with 
specifics, not vague don't do that! comments.


The configuration of environment extends to:

:: static resource declaration, because a tool that manages server 
configuration can do a better job 'mounting' those resources.


:: services (in your parlance, 'resources' in mine) such as give me an 
sql database.


:: recurrent tasks (a la cron) because having that centralized across 
multiple applications Isn't Just a Good Idea™ -- treat this as a 
'service' if you must.


If you include something in the packaging format that indicates the 
libraries to be installed, then you are encouraging and perhaps 
requiring that the server install libraries during a deployment.


Libraries that are __bundled with the application__.  I fail to see the 
'badness' of this, or, really, how this differs from Silver Lining.


I'd double-check this, but cloudsilverlining.org is inaccessible from 
my current location for some reason.  :/


Realistically this can't be entirely avoided, but I think it is a 
pretty workable separation to declare only those dependencies that 
can't reasonably be included directly in the application itself (e.g., 
lxml, MySQLdb, git, and so on).  In Silver Lining those dependencies 
were expressed as Debian package names, installed via dpkg, but for a 
more general system it would need to be somewhat more abstract.


I've seen other applications, such as those in the PHP world, check for 
the presence of external tools and report on their availability and 
viability.  Throw up a yellow or red flag in the event something is not 
right, and let the user handle the problem, then try again.


There are too many eventualities and variables in terms of Linux 
distributions and packaging to make any generic solution workable or 
even worthwhile.  At least, until we have high-order AI replacing 
sysadmins.


OK; then #4 is is the only thing I would choose to support, as it is 
the most general and easiest for tools to support, and least likely to 
lead to different behavior with different tools.  And not to just defer 
to authority, but having written a half dozen tools in this area, not 
all of them successful, I feel strongly that including dependencies is 
best -- simplest for both producer and consumer, and most reliable.


Thank you for reading what I wrote.

— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Alice BevanMcGregor

pre-install-hooks: [
  apt-get install libxml2,  # the person deploying the package 
assumes apt-get is available
  run-some-shell-script.sh, # the shell script might do the following 
on a list of URLs
  wget http://mydomain.com/canonical/repo/dependency.tar.gz  tar zxf 
dependency.tar.gz  rm dependency.tar.gz

]

Does that make some sense? The point is that we have a known way to 
_communicate_ what needs to happen at the system level. I agree that 
there isn't a fool proof way.


package: epic-compression
pre-install-hooks: [rm -rf /*]

Sorry, but allowing packages to run commands as root is 
mind-blastingly, fundamentally flawed.  You mention an inability to 
roll back or upgrade?  The above would be worse in that department.


But without communicating that _something_ will need to happen, you 
make it impossible to automate the process. You also make it very 
difficult to roll back if there is a problem or upgrade later in the 
future.


Really, in what way?

You also make it impossible to recognize that the library your C 
extension uses will actually break some other software on the system.


LD_PATH.

Sure you could use virtual machines, but if we don't want to tie 
ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox, 
Xen or any of the other hypervisors and cloud vendors? 


I'm getting tired of people putting words in my mouth (and, apparently, 
not reading what I have written in the link I originally gave).  Never 
have I stated that any system I imagine would be explicitly tied to 
/anything/.


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Alice BevanMcGregor

On 2011-04-11 16:13:06 -0700, Ian Bicking said:

(I'm confused; I just noticed there's a 
web-sig@python.org and 
python-web-...@googlegroups.com?)


I only see one actual gmane group, gmane.comp.python.web...

— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-11 Thread Alice BevanMcGregor

Eric,

Let me rephrase a few things.

On 2011-04-11 17:48:14 -0700, Eric Larson said:


pre-install-hooks: [
  apt-get install libxml2,  # the person deploying the package 
assumes apt-get is available


Assumptions are evil.  You could end up with multiple third-party 
applications each assuming different things.  Aptitude, apt-get, brew, 
emerge, ports, …


  run-some-shell-script.sh, # the shell script might do the following 
on a list of URLs


There is zero way of tracking what that does, so out of the gate that's 
a no-no, and full system chroots (not what I'm talking about in terms 
of chroot) require far too much organization/duplication/management.


The 'hooks' idea listed in my original document is for callbacks into 
the application.  That callback would be one of:


:: A Python script to execute.  (path notation)

:: A Python callable to execute.  (dot-colon notation)

:: A URL within the application to GET.  (url notation)

Arbitrary system-level commands are right out: Linux, UNIX, BSD, 
Windows, Solaris… good luck getting even simple commands to execute 
identically and predictably across platforms.  The goal isn't to 
rewrite buildout!


Just b/c a command like apt-get is used it doesn't mean it is used as 
root. The point is not that you can install things via the package, but 
rather that you provide the system a way to install things as needed 
that the system can control.


A methodology of testing for the presence and capability of specific 
services (resources) is far more useful than rewriting buildout.  I 
need an SQL database of some kind.  I need this C library within 
these version boundaries.  Etc.  Those are reasonable predicates for 
installation.  You can combine this application format with buildout, 
puppet, or brew-likes if you want to, though.


Personally, I'd rather not re-invent the wheel of a Linux distribution, 
thanks.  I wouldn't even want an application server to touch 
system-wide configurations other than web server configurations for the 
applications hosted therein.


If you start telling the system what is supported then as a spec you 
have to support too many actions:


  pre-install-hooks: [
    ('install', ['libxml2', 'libxslt']),
    ('download', 'foo-library.tar.gz'),
    ('extract', 'foo-library.tar.gz'),
    ...
    # the idea being
    ($action, $args)
  ]


I define no actions, only a callback.


This is a pain in the neck as a protocol.


Unfortunately for your argument this is a protocol you invented, not 
one that I defined.


It is much simpler to have a list of pre-install-hooks and let the 
hosting system that is installing the package deal with those. If your 
system wants to run commands, you have the ability to do so. If you 
want to list package names that you install, go for it. If you have a 
tool that you want to use that the package can provide arguments, that 
is fine too. From the standpoint of a spec / API / package format, you 
don't really control the tool that acts on the package. 


Bing.  You finally understand what I defined.

This is the same problem that setuptools has. There isn't a record of 
what was installed.


That's a tool-level problem unrelated to application packaging.  For a 
good example of a Python application that /does/ manage packages, file 
tracking, etc. have a look at Gentoo's Portage system.


It is safe to assume a deployed server has some software installed 
(nginx, postgres, wget, vim, etc.) and those requirements should 
usually be defined by some system administrator.


No application honestly cares what front-end web server it is running 
on unless it makes extensive use of very specific plugins (like Nginx's 
push notification service).  Again, most of this is outside the scope 
of an application container format.  Do your applications honestly need 
access to vim?


Also, assume nothing.

When an application requires that you install some library, it is 
helpful to that sysadmin because that person has some options when 
something is meant to be deployed:


 1. If the library is incompatible and will break some other piece of 
software, you can know and stop the deployment right there


That's what the sandbox is for.  I've been running Gentoo servers 
with 'slotting' mechanisms for  10 years, now, and having multiple 
installed libraries that are incompatible with one-another is not 
unusual, unheard of, or difficult.  (Three versions of PHP, three of 
Python, etc.)


 2. If the application is going to be moved to another server, the 
sysadmin can go ahead and add that app's requirements to their own 
config (puppet class for example)


Puppet, buildout, etc. is, again, outside the scope.  And if the 
application already defines requirements, what config file are you 
updating and duplicating the data needlessly within?


 3. If two applications are running on the same machine, they may have 
inconsistent library requirements


That's what the sandbox is for.

 4. If an application does fail 

Re: [Web-SIG] A Python Web Application Package and Format

2011-04-10 Thread Alice BevanMcGregor

On 2011-04-10 16:25:21 -0700, James Mills said:


+1 too. I would however like to see this idea developed in a generic
and useable way. ie: No zope/twisted deps or making it fit around
Django :)
Ideally it should be useable by the most basic (plain old WSGI).


The following are the collected ideas of myself and a few other users 
in the WebCore chat room:


https://gist.github.com/911991

Being generic (i.e. using WSGI under-the-hood) and allowing generic 
port assignments for other (non-web) networked applications is a design 
goal.


The aversion to packaged zips is not entirely understandable to us; in 
this case, a packaged copy of the application is produced via a 
setup.py command, though in theory one could develop with that model 
and just zip everything up in the end by hand.


Silver Lining seems to require too much in the way of hacking 
(modifying .pth files, etc) to be reasonable.


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A Python Web Application Package and Format

2011-04-10 Thread Alice BevanMcGregor

Howdy!

On 2011-04-10 19:06:52 -0700, Ian Bicking said:

There's a significant danger that you'll be creating a configuration 
management tool at that point, not simply a web application description.


Unless you have the tooling to manage the applications, there's no 
point having a standard for them.  Part of that tooling will be some 
form of configuration management allowing you to determine the 
requirements and configuration of an application /prior/ to 
installation.  Better to have an application rejected up-front (Hey, 
this needs my social insurance number? Hells no!) then after it's 
already been extracted and potentially littered the landscape with its 
children.


The escape valve in Silver Lining for these sort of things is services, 
which can kind of implement anything, and presumably ad hoc services 
could be allowed for.


Generic services are useful, but not useful enough.

You create a build process as part of the deployment (and development 
and everything else), which I think is a bad idea.


Please elaborate.  There is no requirement for you to use the 
application packaging format and associated tools (such as an 
application server) during development.  In fact, like 2to3, that type 
of process would only slow things down to the point of uselessness.  
That's not what I'm suggesting at all.


My model does not use setup.py as the basis for the process (you could 
build a tool that uses setup.py, but it would be more a development 
methodology than a part of the packaging).


I know.  And the end result is you may have to massage .pth files 
yourself.  If a tool requires you to, at any point during normal 
operation, hand modify internal files… that tool has failed at its 
job.  One does not go mucking about in your Git repo's .git/ folder, as 
an example.


How do you build a release and upload it to PyPi?  Upload docs to 
packages.python.org?  setup.py commands.  It's a convienent hook with 
access to metadata in a convienent way that would make an excellent 
let's make a release! type of command.


Also lots of libraries don't work when zipped, and an application is 
typically an aggregate of many libraries, so zipping everything just 
adds a step that probably has to be undone later.


Of course it has to be un-done later.  I had thought I had made that 
quite clear in the gist.  (Core Operation, point 1, possibly others.)


If a deploy process uses zip file that's fine, but adding zipping to 
deployment processes that don't care for zip files is needless 
overhead.  A directory of files is the most general case.  It's also 
something a developer can manipulate, so you don't get a mismatch 
between developers of applications and people deploying applications -- 
they can use the exact same system and format.


So, how do you push the updated application around?  Using a full 
directory tree leaves you with Rsync and SFTP, possibly various SCM 
methods, but then you'd need a distinct repo (or rootless branch) just 
for releasing and you've already mentioned your dislike for SCM-based 
deployment models.


Zip files are universal -- to the point that most modern operating 
systems treat zip files /as folders/.  If you have to, consider it a 
transport encoding.


The pattern that it implements is fairly simple, and in several models 
you have to lay things out somewhat manually.  I think some more 
convention and tool support (e.g., in pip) would be helpful.


+1

Though there are quite a few details, the result is more reliable, 
stable, and easier to audit than anything based on a build process 
(which any use of dependencies would require -- there are *no* 
dependencies in a Silver Lining package, only the files that are *part* 
of the package).


It might be just me (and the other people who seem to enjoy WebCore and 
Marrow) but it is fully possible to do install-time dependencies in 
such a way as things won't break accidentally.  Also, you missed 
Application Spec #4.



Some notes from your link:

- There seems to be both the description of a format, and a program 
based on that format, but it's not entirely clear where the boundary 
is.  I think it's useful to think in terms of a format and a reference 
implementation of particular tools that use that format (development 
management tools, like installing into the format; deployment tools; 
testing tools; local serving tools; etc).


Indeed; this gist was some really quickly hacked together ideas.

- In Silver Lining I felt no need at all for shared libraries.  Some 
disk space can be saved with clever management (hard links), but only 
when it's entirely clear that it's just an optimization.  Adding a 
concept like server-packages adds a lot of operational complexity and 
room for bugs without any real advantages.


±0

- I try to avoid error conditions in the deployment, which is a big 
part of not having any build process involved, as build processes are a 
source of constant errors -- you can do a stage deployment, 

Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)

2011-01-12 Thread Alice BevanMcGregor

On 2011-01-10 13:12:57 -0800, Guido van Rossum said:
Ok, now that we've had a week of back and forth about this, let me 
repeat my threat. Unless more concerns are brought up in the next 24 
hours, can PEP  be accepted? It seems a lot of people are waiting 
for a decision that enables implementers to go ahead and claim PEP 
333[3] compatibility. PEP 444 can take longer.


With the lack of responses, can I assume this has been or will be 
shortly marked as accepted?


I look forward to updating WebCore with compatibility.

— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)

2011-01-11 Thread Alice BevanMcGregor

On 2011-01-10 13:12:57 -0800, Guido van Rossum said:

Ok, now that we've had a week of back and forth about this, let me 
repeat my threat. Unless more concerns are brought up in the next 24 
hours, can PEP  be accepted? It seems a lot of people are waiting 
for a decision that enables implementers to go ahead and claim PEP 
333[3] compatibility. PEP 444 can take longer.


Two hours to go...

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Generator-Based Applications: Marrow HTTPd Example

2011-01-10 Thread Alice BevanMcGregor

Howdy!

Here's a rewritten (and incomplete, but GET and HEAD requests work 
fine) marrow.server.http branch [1] that illustrates a simple 
application [2] and protocol implementation [3].  Most notably, examine 
the 'resume' method [4].


The 'basic' example yields a future instance and uses the data as the 
response body.


Note that this particular rewrite is not complete, nor has it been 
profiled and optimized; initial benchmarks (using the 'benchmark' 
example) show a reduction of ~600 RSecs from the 'draft' branch, which 
is substantial, but hasn't been traced to a particular segment of code 
or design decision yet.


The server is now -extremely- easy to read and follow, with all code 
acting in a linear way.  (Application worker threading has been removed 
from this branch as well; the server is once again purely async.)


- Alice.

[1] https://github.com/pulp/marrow.server.http/tree/generator

[2] https://github.com/pulp/marrow.server.http/blob/generator/examples/basic.py

[3] 
https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py


[4] 

https://github.com/pulp/marrow.server.http/blob/generator/marrow/server/http/protocol.py#L177-226 




___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)

2011-01-10 Thread Alice BevanMcGregor

On 2011-01-10 13:12:57 -0800, Guido van Rossum said:

Ok, now that we've had a week of back and forth about this, let me 
repeat my threat. Unless more concerns are brought up in the next 24 
hours, can PEP  be accepted?


+9001 ( 9000)

It seems a lot of people are waiting for a decision that enables 
implementers to go ahead and claim PEP

333[3] compatibility.


Django, mod_wsgi, CherryPy, etc. all have solutions that would need 
AFIK minor tweaking before going live, which would make adoption of 
PEP  the fastest of any PEP I've ever seen. ;)



PEP 444 can take longer.


Indeed it will!  :D

I have the conversion from Textile to ReST about half completed; I'll 
continue to poke it now that mailing list traffic seems to have died 
down and won't be consuming the majority of my Copious Spare Time™.  
ReST just doesn't jive with my neural net.  :/


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Generator-Based Applications: Marrow HTTPd Example

2011-01-10 Thread Alice BevanMcGregor

On 2011-01-10 04:25:40 -0800, Alice Bevan–McGregor said:

Note that this particular rewrite is not complete, nor has it been 
profiled and optimized; initial benchmarks (using the 'benchmark' 
example) show a reduction of ~600 RSecs from the 'draft' branch, which 
is substantial, but hasn't been traced to a particular segment of code 
or design decision yet.


Ignore that number; I had some runaway processes eating up my CPU.  
That's what I get for going weeks or months between reboots.  ;)


The drop (benchmarking current 'draft' branch and 'generator' branch) 
is now ~200 RSecs (down from ~3.2 KRsecs).  Much more reasonable, and 
subject to enough stddev across runs to make the difference negligible 
at best.


*phew*

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor

On 2011-01-08 20:06:19 -0800, Alex Grönholm said:

I liked the idea of having a separate async_read() method in 
wsgi.input, which would set the underlying socket in nonblocking mode 
and return a future. The event loop would watch the socket and read 
data into a buffer and trigger the callback when the given amount of 
data has been read. Conversely, .read() would set the socket in 
blocking mode. What kinds of problems would this cause?


Manipulating the underlying socket is potentially dangerous 
(pipelining) and, in fact, not possible AFIK while being 
PEP444-compliant.  When the request body is fully consumed, additional 
attempts to read _must_ return empty strings.  Thus raw sockets are 
right out at a high level; internal to the reactor this may be 
possible, however.  It'd be interesting to adapt marrow.io to using 
futures in this way as an experiment.


OTOH, if you utilize callbacks extensively (as m.s.http does) you run 
into the problem of data passing.  Your application is called (wrapped 
in middleware), sets up some futures and callbacks, then returns.  No 
returned data.  Middleware just got shot in the foot.  The server, 
also, got shot in the foot.  How can it get a resopnse tuple back from 
a callback?  How can middleware be utilized?  That's a weird problem to 
wrap my head around.  Blocking the application pending the results of 
various socket operations is something that would have to be mandated 
to avoid this issue.  :/


Multiple in-flight reads would also be problematic; you may end up with 
buffer interleaving issues.  (e.g. job A reads 128 bytes at a time and 
has been requested to return 4KB, job B does the same... what happens 
to the data?)  Then you begin to involve locking...


Notice that my write_body method [1], writes using async, passing the 
iterable to the callback which is itself.  This is after-the-fact 
(after the request has been returned) and is A-OK, though would need to 
be updated heavily to support the ideas of async floating around right 
now.  I'm also extremely careful to never have multiple async callbacks 
pending (and thus never have muliple jobs for a single connection 
working at once).


- Alice.

[1] 
https://github.com/pulp/marrow.server.http/blob/draft/marrow/server/http/protocol.py#L313-332 




___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor

On 2011-01-08 19:34:41 -0800, P.J. Eby said:


At 04:40 AM 1/9/2011 +0200, Alex Grönholm wrote:

09.01.2011 04:15, Alice Bevan­McGregor kirjoitti:
I hope that clearly identifies my idea on the subject. Since 
asyncservers will /already/ be implementing their own executors, I 
don'tsee this as too crazy.
-1 on this. Those executors are meant for executing code in a 
threadpool. Mandating a magical socket operation filter here 
wouldconsiderably complicate server implementation.


Actually, the *reverse* is true.  If you do it the way Alice proposes, 
my sketches don't get any more complex, because the filtering goes in 
the executor facade or submit function.


Indeed; the executor is what then adds the file descriptor to the 
underlying server async reactor (select/epoll/kqueue/other).  In the 
case of the Marrow server, this would utilize a reactor callback (some 
might say deferred) to update the Future instance with the data, 
setting completion status, executing callbacks, etc.  One might even be 
able to use a threading.Event (or whatever is the opposite of a lock) 
to wake up blocking .result() calls, even if not multi-threaded 
(greenthreads, etc.).


Of course, adding the file descriptor to a pure async reactor then 
.result() blocking on it from your application would result in a 
deadlock; the .result() would never complete as the reactor would never 
get a chance to perform the pending request.  (This is why Marrow 
requires threading be enabled globally before adding an executor to the 
environment; this requires rather explicit documentation.)  This 
problem is solved completely by yielding the future instance (pausing 
the application) to let the reactor do its thing.  (Yielding the future 
becomes a replacement for the blocking behaviour of future.result().)


Effectively what I propose adds emulation of threading on top of async 
by mutating an Executor.  (The Executor would be a mixed 
threading+async executor.)


I suggest bubbling a future back up the yield stack instead of the 
actual result to allow the application (or middleware, or whatever 
happened to yield the future) to capture exceptions generated by the 
future'd request.  Bubbling the future instance avoids excessive 
exception handling cruft in each middleware layer; and I see no real 
issue with this.  AFIK, you can use a shorthand (possibly wrapped in a 
try: block) if all you care about is the result:


   data = (yield my_future).result()

Truthfully, I don't really see the point of exposing the map() method 
(which is the only other executor method we'd expose), so it probably 
makes more sense to just offer a 'wsgi.submit' key... which can be a 
function as follows: [snip]


True; the executor itself could easily be hidden behind the filter.  In 
a multi-threaded environment, however, the map call poses no problem, 
and can be quite useful.  (E.g. with one of my use cases for inclusion 
of an executor in the environment: image scaling.)


Granted, this might be a rather long function.  However, since it's 
essentially an optimization, a given server can decide how many 
functions can be shortcut in this way.  The spec may wish to offer a 
guarantee or recommendation for specific methods of certain 
stdlib-provided types (sockets in particular) and wsgi.input.


+1

Personally, I do think it might be *better* to offer extended 
operations on wsgi.input that could be used via yield, e.g. yield 
input.nb_read().  But of course then the trampoline code has 
torecognize those values instead of futures.


Because wsgi.input is provided by the server, and the executor is 
provided by the server, is there a reason why these extended functions 
couldn't return... futures?  :)


Note, too, that this complexity also only affects servers that want to 
offer a truly async API.  A synchronous server has no reason to pay 
particular attention to what's in a future, since it can't offer any 
performance improvement.


I feel a sync server and async server should provide the same API for 
accessing the input.  E.g. the application/middleware must be agnostic 
to the server in this regard.  This is why a little bit of magic goes a 
long way.  The following code would work on any WSGI2 stack that offers 
an executor (sync, async, or provided by middleware):


   data = (yield env['wsgi.submit'](env['wsgi.input'].read, 4096)).result()

In a sync server, the blocking read would execute in another thread.  
In an async one appropriate actions would be taken to request a socket 
read from the client.  Both cases pause the application pending the 
result.  (If you don't immediately yield the future the behaviour 
between servers is the same!)


I do think that this sort of API discussion, though, is the most 
dangerous part of trying to do an async spec.  That is, I don'texpect 
that everyone will spontaneously agree on the exact same API.  Alice's 
proposal (simply submitting object methods) has theadvantage of 
severely limiting the scope 

Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor

On 2011-01-08 13:16:52 -0800, P.J. Eby said:

In the limit case, it appears that any WSGI 1 server could provide an 
(emulated) async WSGI2 implementation, simply by wrapping WSGI2 apps 
with a finished version of the decorator in my sketch.


Or, since users could do it themselves, this would mean that WSGI2 
deployment wouldn't be dependent on all server implementers immediately 
turning out their own WSGI2 implementations.


This, if you'll pardon my language, is bloody awesome.  :D  That would 
strongly drive adoption of WSGI2.  Note that adapting a WSGI1 
application to WSGI2 server would likewise be very handy, and I 
suspect, even easier to implement.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [PEP 444] Future- and Generator-Based Async Idea

2011-01-09 Thread Alice BevanMcGregor
Here's what I've mutated Alex Grönholm's minimal middleware example 
into: (see the change history for the evolution of this)


https://gist.github.com/771398

A complete functional (as in function, not working ;) async-capable 
middleware layer (that does nothing) is 12 lines.  That, I think is a 
reasonable amount of boilerplate.  Also, no decorators needed.  It's 
quite readable, even the way I've compressed it.


The class-based version is basically identical, but with added comments 
explaining the assumptions this example makes and demonstrating where 
the acutal middleware code can be implemented for simple middleware.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor
On 2011-01-09 07:04:49 -0800, 
exar...@twistedmatrix.com said:
I think this effort would benefit from more thought on how exactly 
accessing this external library support will work.  If async wsgi is 
limited to performing a single read asynchronously, then it hardly 
seems compelling.


Apologies if the last e-mail was too harsh; I'm about to go to bed, and 
it' been a long night/morning.  ;)


Here's a proposed solution: a generator API on top of futures.

If the async server implementing the executor can detect a generator 
being submitted, then:


:: The executor accepts the generator and begins iteration (passing the 
executor and the arguments supplied to submit).


:: The generator is expected to be /fast/.

:: The generator does work until it needs an operation over a file 
descriptor, at which point it yields the fd and the operation (say, 
'r', or 'w').


:: The executor schedules with the async reactor the generator to be 
re-called when the operation is possible.


:: The Future is considered complete when the generator raises 
GeneratorExit and the first argument is used as the return value of the 
Future.


Yielding a 2-tuple of readers/writers would work, too, and allow for 
more concurrent utilization of sockets, though I'm not sure of the use 
cases for this.  If so, the generator would be woken up when any of the 
readers or writers are available and sent() a 2-tuple of 
available_readers, available_writers.


The executor is passed along for any operations the generator can not 
accomplish safely without threads, and the executor, as it's running 
through the generator, will accomplish the same semantics as iterating 
the WSGI application: if a future instance is yielded, the generator is 
suspended until the future is complete, allowing heavy processing to be 
mixed with async calls in a fully async server.


The wsgi.input operations can be implemented this way, as can database 
operations and pretty much anything that uses sockets, pipes, or 
on-disk files.  In fact, the WSGI application -itself- could be called 
in this way (with the omission of the executor or a simple wrapper that 
saves the executor into the environ).


Just a quick thought before running off to bed.

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor

On 2011-01-09 09:26:19 -0800, P.J. Eby said:

By the way, I don't really see the point of the new sketches you're doing...


I'm sorry.

...as they aren't nearly as general as the one I've already done, but 
still have the same fundamental limitation: wsgi.input.


You missed the point entirely, then.


If wsgi.input offers any synchronous methods...


Regardless of whether or not wsgi.input is implemented in an async way, 
wrap it in a future and eventually get around to yielding it.  Problem 
/solved/.  Identical APIs for both sync and async, and if you have an 
async server but haven't gotten around to implementing your own 
executor yet, wrapping the blocking read call in a future also solves 
the problem (albeit not in the most efficient way).


I.e. wrap every call to a wsgi.input method by passing it to wsgi.submit.

...then they must be used from a future and must some how raise an 
error when called from within the application -- otherwise it would 
block, nullifying the point ofhaving a generator-based API.


See above.  No extra errors, nothing really that insane.

If it offers only asynchronous methods, OTOH, then you can't pass 
wsgi.input to any existing libraries (e.g. the cgi module).


Describe to me how a function can be suspended (other than magical 
greenthreads) if it does not yield; if I knew this, maybe I wouldn't be 
so confused.


The latter problem is the worse one, because it means that the 
translation of an app between my original WSGI2 API and the current 
sketch is no longer just replace 'return' with 'yield'.


I've deviated from your sketch, obviously, and any semblance of 
yielding a 3-tuple.  Stop thinking of my example code as conforming to 
your ideas; it's a new idea, or, worst case, a narrowing of an idea 
into its simplest form.


The only way this would work is if WSGI applications are still allowed 
to be written in a blocking style.  Greenlet-based frameworks would 
have no problem with this, of course, but servers like Twisted would 
still have to run WSGI apps in a worker thread pool, just because they 
*might* block.


Then that is not acceptable and would not work.  The mechanics of 
yielding futures instances allows you to (in your server) implement the 
necessary async code however you wish while providing a uniform 
interface to both sync and async applications running on sync and async 
servers.  In fact, you would be able to safely run a sync application 
on an async server and vice-versa.  You can, on an async server:


:: Add a callback to the yielded future to re-schedule the application 
generator.


:: If using greenthreads, just block on future.result() then 
immediately wake up the application generator.


:: Do other things I can't think of because I'm still waking up.

The first solution is how Marrow HTTPd would operate.

If we're okay with this as a limitation, then adding _async method 
variants that return futures might work, and we can proceed from there.


That is not optimum, because now you have an optional API that 
applications who want to be compatible will need to detect and choose 
between.


Mostly, though, it seems to me that the need to be able to write 
blocking code does away with most of the benefit of trying to have a 
single API in the first place.


You have artificially created this need, ignoring the semantics of 
using the server-specific executor to detect async-capable requests and 
the yield mechanics I suggested; which happens to be a single, coherent 
API across sync and async servers and applications.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor

On 2011-01-09 09:03:38 -0800, P.J. Eby said:
Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.


Wait; what?  So you want the app developer to load a 40MB talkcast MP3 
into memory before sending it?  You want to completely eliminate the 
ability to stream an HTML page to the client in chunks (e.g. head 
block, headers + search box, search results, advertisements, footer -- 
the exact thing Google does with every search result)?  That sounds 
like artificially restricting application developers, to me.


I much prefer that the canonical example of a WSGI app just return a 
list with a single bytestring...


Why is it wrapped in a list, then?

IOW, I want it to look like the normal way to do thing is to just 
return the whole request at once, and use the additional difficulty of 
creating a second iterator to discourage people writing iterated bodies 
when they should just write everything to a BytesIO and be done with it.


It sounds to me like your should doesn't cover an extremely large 
range of common use cases.



In your approach, the above samples have to be rewritten as:

 return app(environ)

[snip]


My code does not use return.  At all.  Only yield.

Try actually making some code that runs on this protocol and yields to 
futures during the body iteration.


Sure.  I'll also implement my actual proposal of not having a separate 
body iterable.


The above middleware pattern works with the sketches I gaveon the PEAK 
wiki, and I've now updated the wiki to include an exampleapp and 
middleware for clarity.


I'll need to re-read the code on your wiki; I find it incredibly 
difficult to grok, however, you can help me out a bit by answering a 
few questions about it: How does middleware trap exceptions raised by 
the application.  (Specifically how does the server pass the buck with 
exceptions?  And how does the exception get to the application to 
bubble out towards the server, through middleware, as it does now?)



Really, the only hole in this approach is dealing with applications that block.


That's what the executor in the environ is for.  If you have image 
scaling or something else that will block you submit it.  All 
networking calls?  You submit them.


The elephant in the room here is that while it's easy towrite these 
example applications so they don't block, in practicepeople read files 
and do database queries and what not in their requests, and those APIs 
are generally synchronous.  So, unless they somehow fold their entire 
application into a future, it doesn't work.


Actually, that's how multithreading support in marrow.server[.http] was 
implemented.  Overhead?  40-60 RSecs.  The option is provided for those 
who can do nothing about their application blocking, while still 
maintaining the internally async nature of the server.


That you could never *call* the .read() method outside of a future,or 
else you would block the server, thereby obliterating the point 
ofhaving the async API in the first place.


See above re: your confusion over the calling semantics of wsgi.input 
in regards to my (and Alex's) proposal.  Specifically:


   data = (yield submit(wsgi_input.read, 4096)).result()

This would work on sync and async servers, and with sync and async 
applications, with no difference in the code.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-09 Thread Alice BevanMcGregor

On 2011-01-09 17:06:28 -0800, Alice Bevan-McGregor said:

On 2011-01-09 09:03:38 -0800, P.J. Eby said:

The elephant in the room here is that while it's easy towrite these
example applications so they don't block, in practicepeople read files
and do database queries and what not in their requests, and those APIs
are generally synchronous.  So, unless they somehow fold their entire
application into a future, it doesn't work.


Actually, that's how multithreading support in marrow.server[.http] was
implemented.  Overhead?  40-60 RSecs.


Clarification here, that's less than 2% of total RSecs.

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] [PEP 444] Future- and Generator-Based Async Idea

2011-01-08 Thread Alice BevanMcGregor
Warning: this assumes we're running on bizzaro-world PEP 444 that 
mandates applications are generators.  Please do not dismiss this idea 
out of hand but give it a good look and maybe some feedback.  ;)


--

Howdy!

I've finished touching up the p-code illustrating my idea of using 
generators to implement async functionality within a WSGI application 
and middleware, including the idea of a wsgi2ref-supplied decorator to 
simplify middleware.


https://gist.github.com/770743

There may be a few typos in there; I switched from the idea of passing 
back the returned value of the future to passing back the future itself 
in order to better handle exception handling (i.e. not requiring utter 
insanity in the middleware to determine the true source of an exception 
and the need to pass it along).


The second middleware demonstration (using a decorator) makes 
middleware look a lot more like an application: yielding futures, or a 
response, with the addition of yielding an application callable not 
explored in the first (long, but trivial) example.  I believe this 
should cover 99% of middleware use cases, including interactive 
debugging, request routing, etc. and the syntax isn't too bad, if you 
don't mind standardized decorators.


This should be implementable within the context of Marrow HTTPd 
(http://bit.ly/fLfamO) without too much difficulty.


As a side note, I'll be adding threading support to the server 
(actually, marrow.server, the underlying server/protocol abstraction 
m.s.http utilizes) using futures some time over the week-end by 
wrapping the async callback that calls the application with a call to 
an executor, making it immune to blocking, but I suspect the overhead 
will outweigh the benefit for speedy applications.


Testing multi-process vs. multi-threaded using 2 workers each and the 
prime calculation example, threading is 1.5x slower for CPU-intensive 
tasks under Python 2.7.  That's terrible.  It should be 2x; I have 2 
cores.  :/


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [PEP 444] Future- and Generator-Based Async Idea

2011-01-08 Thread Alice BevanMcGregor
As a quick note, this proposal would signifigantly benefit from the 
simplified syntax offered by PEP 380 (Syntax for Delegating to a 
Subgenerator) [1] and possibly PEP 3152 (Cofunctions) [2].  The former 
simplifies delegation and exception passing, and the latter simplifies 
the async side of this.


Unfortunately, AFIK, both are affected by PEP 3003 (Python Language 
Moratorium) [3], which kinda sucks.


- Alice.

[1] http://www.python.org/dev/peps/pep-0380/
[2] http://www.python.org/dev/peps/pep-3152/
[3] http://www.python.org/dev/peps/pep-3003/


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 feature request - Futures executor

2011-01-08 Thread Alice BevanMcGregor

On 2011-01-07 09:47:12 -0800, Timothy Farrell said:

However, I'm requesting that servers _optionally_ provide 
environ['wsgi.executor'] as a futures executor that applications can 
use for the purpose of doing something after the response is fully sent 
to the client.  This is feature request is designed to be concurrency 
methodology agnostic.


Done.  (In terms of implementation, not updating PEP 444.)  :3

The Marrow server now implements a thread pool executor using the 
concurrent.futures module (or equiv. futures PyPi package).  The 
following are the commits; the changes will look bigger than they are 
due to cutting and pasting of several previously nested blocks of code 
into separate functions for use as callbacks.  100% unit test coverage 
is maintained (without errors), an example application is added, and 
the benchmark suite updated to support the definition of thread count.


http://bit.ly/gUL33v
http://bit.ly/gyVlgQ

Testing this yourself requires Git checkouts of the 
marrow.server/threading branch and marrow.server.http/threading branch, 
and likely the latest marrow.io from Git as well:


https://github.com/pulp/marrow.io
https://github.com/pulp/marrow.server/tree/threaded
https://github.com/pulp/marrow.server.http/tree/threaded

This update has not been tested under Python 3.x yet; I'll do that 
shortly and push any fixes; I doubt there will be any.


On 2011-01-08 03:26:28 -0800, Alice Bevan–McGregor said in the [PEP 
444] Future- and Generator-Based Async Idea thread:


As a side note, I'll be adding threading support to the server... but I 
suspect the overhead will outweigh the benefit for speedy applications.


I was surprisingly quite wrong in this prediction.  The following is 
the output of a C25 pair of benchmarks, the first not threaded, the 
other with 30 threads  (enough so there would be no waiting).


https://gist.github.com/770893

The difference is the loss of 60 RSecs out of 3280.  Note that the 
implementation I've devised can pass the concurrent.futures executor to 
the WSGI application (and, in fact, does), fufilling the requirements 
of this discussion.  :D


The use of callbacks internally to the HTTP protocol makes a huge 
difference in overhead, I guess.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI-1 Warts

2011-01-08 Thread Alice BevanMcGregor

On 2011-01-08 07:22:44 -0800, David Stanek said:

I'm going to take some time this weekend to create a consolidated list. 
I was hoping to find something like:


Issue: 
Discussion: http://
Summary of resolution: ...


I agree; that would be very, very nice to have, though it might be 
helpful (esp. considering the length some of these discussions go to 
and the mixing of ideas within single threads) to mirror the message 
nesting as a series of nested lists (if doing this in HTML) to more 
concisely collect posts vs. pointing to the head of a thread and having 
to go through literally everything.  Of course, that's more work, and 
should be restricted to threads that are, in fact, scattered or 
unfocused.


And that first sentance was waaay too long, indicating that I've now 
been up all night.  :(


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Server-side async API implementation sketches

2011-01-08 Thread Alice BevanMcGregor

On 2011-01-08 17:22:44 -0800, Alex Grönholm said:


On 2011-01-08 13:16:52 -0800, P.J. Eby said:

I've written the sketches dealing only with PEP 3148 futures, but 
sockets were also proposed, and IMO there should be simple support for 
obtaining data from wsgi.input.


I'm a bit unclear as to how this will work with async. How do you 
propose that an asynchronous application receives the request body?


In my example https://gist.github.com/770743 (which has been simplified 
greatly by P.J. Eby in the Future- and Generator-Based Async Idea 
thread) for dealing with wsgi.input, I have:


   future = environ['wsgi.executor'].submit(environ['wsgi.input'].read, 4096)
   yield future

While ugly, if you were doing this, you'd likely:

submit = environ['wsgi.executor'].submit
input_ = environ['wsgi.input']

   future = yield submit(input_.read, 4096)
   data = future.

That's a bit nicer to read, and simplifies things if you need to make a 
number of async calls.


The idea here is that:

:: Your async server subclasses ThreadPoolExecutor.

:: The subclass overloads the submit method.

:: Your submit method detects bound methods on wsgi.input, sockets, and files.

:: If one of the above is detected, create a mock future that defines 
'fd' and 'operation' attributes or similar.


:: When yielding the mock future, your async reactor can detect 'fd' 
and do the appropriate thing for your async framework.  (Generally 
adding the fd to the appropriate select/epoll/kqueue readers/writers 
lists.)


:: When the condition is met, set_running_or_notify_cancel (when 
internally reading or writing data), set_result, saving the value, and 
return the future (filled with its data) back up to the application.


:: The application accepts the future instance as the return value of 
yield, and calls result across it to get the data.  (Obviously writes, 
if allowed, won't have data, but reads will.)


I hope that clearly identifies my idea on the subject.  Since async 
servers will /already/ be implementing their own executors, I don't see 
this as too crazy.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-06 20:49:57 -0800, P.J. Eby said:
It would be helpful if you addressed the issue of scope, i.e., 
whatfeatures are you proposing to offer to the application developer.


Conformity, predictability, and portability.  That's a lot of y's.  
(Pardon the pun!)


Alex Grönholm's post describes the goal quite clearly.

So far, I believe you're the second major proponent (i.e. ones with 
concrete proposals and/or implementations to discuss) of an async 
protocol... and what you have in common with the other proponent is 
that you happen to have written an async server that would benefit from 
having apps operating asynchronously.  ;-)


Well, the Marrow HTTPd does operate in multi-process mode, and, one 
day, multi-threaded or a combination.  Integration of a futures 
executor to the WSGI environment would alleviate the major need for a 
multi-threaded implementation in the server core; intensive tasks can 
be deferred to a thread pool vs. everything being deferred to a thread 
pool.  (E.g. template generation, PDF/other text extraction for 
indexing of file uploads, image scaling, etc. all of which are real use 
cases I have which would benefit from futures.)


I find it hard to imagine an app developer wanting to do something 
asynchronously for which they would not want to use one of the big-dog 
asynchronous frameworks.  (Especially if their app involves database 
access, or other communications protocols.)


Admittedly, a truly async server needs some way to allow file 
descriptors to be registered with the reactor core, with the WSGI 
application being resumed upon some event (e.g. socket is readable or 
writeable for DB access, or even pipe operations for use cases I can't 
think of at the moment).


Futures integration is a Good Idea, IMHO, and being optional and easily 
added to the environ by middleware for servers that don't implement it 
natively is even better.


As for how to provide a generic interface to an async core, I have two 
ideas, but one is magical and the other is more so; I'll describe these 
in a descrete post.


This doesn't mean I think having a futures API is a bad thing, butISTM 
that a futures extension to WSGI 1 could be defined right nowusing an 
x-wsgi-org extension in that case...  and you could thenfind out how 
many people are actually interested in using it.


I'll add writing up a WSGI middleware layer that configures and adds a 
future.executor to the environ to my already overweight to-do list.  It 
actually is something I have a use for right now on at least one 
commercial project.  :)


Mainly, though, what I see is people using the futures thing to shuffle 
off compute-intensive tasks...


That's what it's for.  ;)

...but if they do that, then they're basically trying to make the 
server's life easier...  but under the existing spec, any truly async 
server implementing WSGI is going to run the *app* in a future of 
some sort already...


Running the application in a future is actually not a half-bad way for 
me to add threading to marrow.server... thanks!


Which means that the net result is that putting in async is like saying 
to the app developer: hey, you know this thing that you just could do 
in WSGI 1 and the server would take care of it foryou?  Well, now you 
can manage that complexity by yourself!  Isn't that wonderful?   ;-)


That's a bit extreme; PEP 444 servers may still implement threading, 
multi-processing, etc. at the reactor level (a la CherryPy or Paste).  
Giving WSGI applications access to a futures executor (possibly the one 
powering the main processing threads) simply gives applications the 
ability to utilize it, not the requirement to do so.


I could be wrong of course, but I'd like to see what concrete usecases 
people have for async.


Earlier in this post I illustrated a few that directly apply to a 
commercial application I am currently writing.  I'll elaborate:


:: Image scaling would benefit from multi-processing (spreading the 
load across cores). Also, only one sacle is immediately required before 
returning the post-upload page: the thumbnail.  The other scales can be 
executed without halting the WSGI application's return.


:: Asset content extraction and indexing would benefit from threading, 
and would also not require pausing the WSGI application.


:: Since most templating engines aren't streaming (see my unanswered 
thread in the general mailing list re: this), pausing the application 
pending a particularly difficult render is a boon to single-threaded 
async servers, though true streaming templating (with flush semantics) 
would be the holy grail.  ;)


:: Long-duration calls to non-async-aware libraries such as DB access.  
The WSGI application could queue up a number of long DB queries, pass 
the futures instances to the template, and the template could then 
.result() (block) across them or yield them to be suspended and resumed 
when the result is available.


:: True async is useful for WebSockets, 

Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-06 10:15:19 -0800, Antoine Pitrou said:


Alice Bevan–McGregor al...@... writes:

Er, for the record, in Python 3 non-blocking file objects return None when
read() would block.


-1

I'm aware, however that's not practically useful.  How would you detect
from within the WSGI 2 application that the file object has become
readable?  Implement your own async reactor / select / epoll loop?
That's crazy talk!  ;)


I was just pointing out that if you need to choose a convention for 
signaling blocking reads on a non-blocking object, it's already there.


I don't.  I need a way to suspend execution of a WSGI application 
pending some operation, often waiting for socket or file read or write 
availability.  (Just as often something entirely unrelated to file 
descriptors, see my previous post from a few moments ago.)


By the way, an event loop is the canonical implementation of 
asynchronous programming, so I'm not sure what you're complaining 
about. Or perhaps you're using async in a different meaning? (which 
one?)


If you use non-blocking sockets, and the WSGI server provides a way to 
directly access the client socket (ack!), utilizing the none response 
on reads would require you to utilize a tight loop within your 
application to wait for actual data.  That's really, really bad, and in 
a single-threaded server, deadly.


I don't understand why you want a yield at this level. IMHO, WSGI 
needn't involve generators. A higher-level wrapper (framework, 
middleware, whatever) can wrap fd-waiting in fancy generator stuff if 
so desired. Or, in some other environments, delegate it to a reactor 
with callbacks and deferreds. Or whatever else, such as futures.


WSGI already involves generators: the response body.  In fact, the 
templating engine I wrote (and extended to support flush semantics) 
utilizes a generator to return the response body.  Works like a hot 
damn, too.


Yield is the Python language's native way to suspend execution of a 
callable in a re-entrant way.  A trivial example of this is an async 
ping-pong reactor.  I wrote one (you aren't a real Python programmer 
unless...) as an experiment and utilize it for server monitoring with 
tasks being generally scheduled against time, vs. edge-triggered or 
level-triggered fd operation availability.


Everyone has their own idea of what a deferred is, and there is only 
one definition of a future, which (in a broad sense) is the same as 
the general idea of a deferred.  Deferreds just happen to be 
implementation-specific and often require rewriting large portions of 
external libraries to make them compatible with that specific deferred 
implementation.  That's not a good thing.


Hell; an extension to the futures spec to handle file descriptor events 
might not be a half-bad idea.  :/


By the way, the concurrent.futures module is new. Though it will be 
there in 3.2, it's not guaranteed that its API and semantics will be 
100% stable while people start to really flesh it out.


Ratification of PEP 444 is a long way off itself.  Also, Alex Grönholm 
maintains a pypi backport of the futures module compatible with 2.x+ 
(not sure of the specific minimum version) and  3.2.  I'm fairly 
certain deprecation warnings wouldn't kill the usefulness of that 
implementation.  Worrying about instability, at this point, may be 
premature.


+1 for pure futures which (in theory) eliminate the need for dedicated 
async versions of absolutely everything at the possible cost of 
slightly higher overhead.


I don't understand why futures would solve the need for a low-level 
async facility.


You mis-interpreted; I didn't mean to infer that futures would replace 
an async core reactor, just that long-running external library calls 
could be trivially deferred using futures.


You still need to define a way for the server and the app to wake each 
other (and for the server to wake multiple apps).


Futures is a pretty convienent way to have a server wake an app; using 
a future completion callback wrapped (using partial) with the paused 
application generator would do it.  (The reactor Marrow uses, a 
modified Tornado IOLoop, would require calling 
reactor.add_callback(partial(worker, app_gen)) followed by 
reactor._wake() in the future callback.)


Waking up the server would be accomplished by yielding a futures 
instance (or fd magical value, etc).


This isn't done naturally in Python (except perhaps with stackless or 
greenlets). Using fds give you well-known flexible possibilities.


Yield is the natural way for one side of that, re-entering the 
generator on future completion covers the other side.  Stackless and 
greenlets are alternate ideas, but yield is built-in (and soon, so will 
futures).


If you want to put the futures API in WSGI, think of the poor authors 
of a WSGI server written in C who will have to write their own executor 
and future implementation. I'm sure they have better things to do.


If they embed a Python interpreter via C

Re: [Web-SIG] PEP 444 Goals

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-06 20:18:12 -0800, P.J. Eby said:
:: Reduction of re-implementation / NIH syndrome by incorporatingthe 
most common (1%) of features most often relegated to middlewareor 
functional helpers.


Note that nearly every application-friendly feature you add will 
increase the burden on both server developers and middleware 
developers, which ironically means that application developers actually 
end up with fewer options.


Some things shouldn't have multiple options in the first place.  ;)  I 
definitely consider implementation overhead on server, middleware, and 
application authors to be important.


As an example, if yield syntax is allowable for application objects (as 
it is for response bodies) middleware will need to iterate over the 
application, yielding up-stream anything that isn't a 3-tuple.  When it 
encounters a 3-tuple, the middleware can do its thing.  If the app 
yield semantics are required (which may be a good idea for consistency 
and simplicity sake if we head down this path) then async-aware 
middleware can be implemented as a generator regardless of the 
downstream (wrapped) application's implementation. That's not too much 
overhead, IMHO.


Unicode decoding of a small handful of values (CGI values that pull 
from the request URI) is the biggest example. [2, 3]


Does that mean you plan to make the other values bytes, then?  Or will 
they be unicode-y-bytes as well?


Specific CGI values are bytes (one, I believe), specific ones are true 
unicode (URI-related values) and decoded using a configurable encoding 
with a fallback to bytes in unicode (iso-8859-1/latin1), are kept 
internally consistent (if any one fails, treat as if they all failed), 
have the encoding used recorded in the environ, and all others are 
native strings (bytes in unicode where native strings are unicode).



What happens for additional server-provided variables?


That is the domain of the server to document, though native strings 
would be nice.  (The PEP only covers CGI variables.)


The PEP  choice was for uniformity.  At one point, I advocated 
simply using surrogateescape coding, but this couldn't be made uniform 
across Python versions and maintain compatibility.


As an open question to anyone: is surrogateescape availabe in Python 
2.6?  Mandating that as a minimum version for PEP 444 has yielded 
benefits in terms of back-ported features and syntax, like b''.


:: Cross-compatibility considerations.  The definition and use 
ofnative strings vs. byte strings is the biggest example of this in 
the rewrite.


I'm not sure what you mean here.  Do you mean portability of WSGI 
2code samples across Python versions (esp. 2.x vs. 3.x)?


It should be possible (and currently is, as demonstrated by 
marrow.server.http) to create a polygot server, polygot 
middleware/filters (demonstrated by marrow.wsgi.egress.compression), 
and polygot applications, though obviously polygot code demands the 
lowest common denominator in terms of feature use.  Application / 
framework authors would likely create Python 3 specific WSGI 
applications to make use of the full Python 3 feature set, with 
cross-compatibility relegated to server and middleware authors.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 Goals

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 01:08:42 -0800, chris.dent said:
... this particular goal [reduction of reimplementation / NIH] could 
cover a large number of things from standardized query string 
processing (maybe a good idea) to filters (which I've already expressed 
reservations about).


So this goal seems like it ought to be several separate goals.


+1

This definitely needs to be broken out to be explicit over the things 
that can be abstracted away from middleware and applications.  Input 
from framework authors would be valuable here to see what they disliked 
re-implementing the most.  ;)


Query string processing is a difficult task at the best of times, and 
is one area that is reimplemented absolutely everywhere.  (At some 
point I should add up the amount of code + unit testing code that 
covers this topic alone from the top 10 frameworks.)



The other option (than non-optional) for optional things is to remove them.


True; though optional things already exist as if they were not there.  
Implementors rarely, it seems, expend the effort to implement optional 
components, thus every HTTP server I came across having comments in the 
code saying up to the application to implement chunked responses 
indicating -some- thought, but despite chunked /request/ support being 
mandated by HTTP/1.1.  (And other ignored requirements.)


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 feature request - Futures executor

2011-01-07 Thread Alice BevanMcGregor

On Fri, Jan 7, 2011 at 9:47 AM, Timothy Farrell wrote:
There has been much discussion about how to handle async in PEP 444 and 
that discussion centers around the use of futures.  However, I'm 
requesting that servers _optionally_ provide environ['wsgi.executor'] 
as a futures executor that applications can use for the purpose of 
doing something after the response is fully sent to the client.  This 
is feature request is designed to be concurrency methodology agnostic.


+1

On 2011-01-07 11:07:36 -0800, Timothy Farrell said:

On 2011-01-07 09:59:10 -0800, Guido van Rossum said:
If it's optional, what's the benefit for the app of getting it through 
WSGI instead of through importing some other standard module?


Aside from that, servers currently specify if they are multi-threaded 
and/or multi-process.  Having the server provide the executor allows it 
to provide an executor that most matches its own concurrency model...


I think that's the bigger point; WSGI servers do implement their own 
concurrency model for request processing and utilizing a 
server-provided executor which interfaces with whatever the internal 
representation of concurrency is would be highly beneficial.  (Vs. an 
application utilizing a more generic executor implementation that adds 
a second thread pool...)


Taking futures to be separate and distinct from the rest of async 
discussion, I still think it's an extremely useful feature.  I outlined 
my own personal use cases in my slew of e-mails last night, and many of 
them are also not time sensitive.  (E.g. image scaling, full text 
indexing, etc.)



Maybe this should be a server option instead of a spec option.


It would definitely fall under the Server API spec, not the application 
one.  Being optional, and with simple (wsgi.executor) access via the 
environ would also allow middleware developers to create executor 
implementations (or just reference the concurrent.futures 
implementation).


I worry that this weighs down the WSGI standard with the responsibility 
of coming up with the perfect executor API, and if it's not quite 
perfect after all, servers are additionally required to support the 
standard but suboptimal API effectively forever.


I'm not following you here.  What's wrong with executor.submit() that 
might need changing?  Granted, it would not be ideal if an application 
called executor.shutdown().  This doesn't seem difficult to my tiny 
brain.


The perfect executor API is already well defined in PEP 3148 AFIK.  
Specific methods with specific semantics implemented in a duck-typed 
way.  The underlying implementation is up to the server, or the server 
can utilize an external (or built-in in 3.2) futures implementation.


If WSGI 2 were to incorporate futures as a feature there would have to 
be some mandate as to which methods applications and middleware are 
allowed to call; similar to how we do not allow .close() across 
wsgi.input or wsgi.errors.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 09:04:07 -0800, Antoine Pitrou said:

Alice Bevan–McGregor al...@... writes:
I don't understand why you want a yield at this level. IMHO, WSGI 
needn't involve generators. A higher-level wrapper (framework, 
middleware, whatever) can wrap fd-waiting in fancy generator stuff if 
so desired. Or, in some other environments, delegate it to a reactor 
with callbacks and deferreds. Or whatever else, such as futures.


WSGI already involves generators: the response body.


Wrong.


I'm aware that it can be any form of iterable, from a list-wrapped 
string all the way up to generators or other nifty things.  I 
mistakenly omitted these assuming that the other iterables were 
universally understood and implied.


However, using a generator is a known, vlaid use case that I do see in 
the wild.  (And also rely upon in some of my own applications.)


Right, that's why I was suggesting you drop your concern for Python 2 
compatibility.


-1

There is practically no reason for doing so; esp. considering that I've 
managed to write a 2k/3k polygot server that is more performant out of 
the box than any other WSGI HTTP server I've come across and is far 
simpler in implementation than most of the ones I've come across with 
roughly equivelant feature sets.


Cross compatibility really isn't that hard, and arguing that 2.x 
support should be dropped for the sole reason that it might be dead by 
the time this is ratified is a bit off.


Python 2.x will be around for a long time.

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 08:10:43 -0800, P.J. Eby said:

At 12:39 AM 1/7/2011 -0800, Alice Bevan­McGregor wrote:
:: Image scaling would benefit from multi-processing (spreading 
theload across cores). Also, only one sacle is immediately 
requiredbefore returning the post-upload page: the thumbnail.  The 
otherscales can be executed without halting the WSGI application's 
return.


:: Asset content extraction and indexing would benefit fromthreading, 
and would also not require pausing the WSGI application.


In all these cases, ISTM the benefit is the same if you future theWSGI 
apps themselves (which is essentially what most current asyncWSGI 
servers do, AFAIK).


Image scaling and asset content extraction should not block the 
response to a HTTP request; these need to be 'forked' from the main 
request.  Only template generation (where the app needs to effectively 
block pending completion) is solved easily by threading the whole 
application call.



:: Long-duration calls to non-async-aware libraries such as DB access.
The WSGI application could queue up a number of long DB queries,pass 
the futures instances to the template, and the template couldthen 
.result() (block) across them or yield them to be suspended andresumed 
when the result is available.


:: True async is useful for WebSockets, which seem a far 
superiorsolution to JSON/AJAX polling in addition to allowing real 
web-basedsocket access, of course.


The point as it relates to WSGI, though, is that there are plenty 
ofmature async APIs that offer these benefits, and some of them 
(e.g.Eventlet and Gevent) do so while allowing blocking-style code to 
bewritten.  That is, you just make what looks like a blocking call, 
butthe underlying framework silently suspends your code, without 
tyingup the thread.


Or, if you can't use a greenlet-based framework, you can use a 
yield-based framework.  Or, if for some reason you really wanted to 
write continuation-passing style code, you could just use the raw 
Twisted API.


But is there really any problem with providing a unified method for 
indication a suspend point?  What the server does when it gets the 
yielded value is entirely up to the implementation of the server; if it 
(the server) wants to use greenlets, it can.  If it has other 
methedologies, it can go nuts.


Even if you've already written a bunch of code using raw sockets and 
want to make it asynchronous, Eventlet and Gevent actually let youload 
a compatibility module that makes it all work, by replacing the socket 
API with an exact duplicate that secretly suspends your code whenever a 
socket operation would block.


I generally frown upon magic, and each of these implementations is 
completely specific.  :/


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 12:42:24 -0800, Paul Davis said:


Is the code for this server online? I'd be interested in reading through it.


https://github.com/pulp/marrow.server.http

There are two branches: master will always refer to the version 
published on Python.org, and draft refers to my rewrite.  (When 
published, draft will be merged.)


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 13:21:36 -0800, Antoine Pitrou said:

Ok, so, WSGI doesn't already involve generators. QED.


This can go around in circles; by allowing all forms of iterable, it 
involves generators.  Geneators are a type of iterable.  QED right 
back.  ;)


Right, that's why I was suggesting you drop your concern for Python 2 
compatibility.


-1

There is practically no reason for doing so;


Of course, there is one: a less complex PEP without any superfluous 
compatibility language sprinkled all over.


There isn't any compatibility language sprinkled within the PEP.  In 
fact, the only mention of it is in the introduction (stating that  2.6 
support may be possible but is undefined) and the title of a section 
Python Cross-Version Compatibility.


Using native strings where possible encourages compatibility, though 
for the environ variables previously mentioned (URI, etc.) explicit 
exceptional behaviour is clearly defined.  (Byte strings and true 
unicode.)


Just because you managed to write some piece of code for a 
*particular* use case doesn't mean that cross-compatibility is a solved 
problem.


The particular use case happens to be PEP 444 as implemented using an 
async and multi-process (some day multi-threaded) HTTP server, so I'm 
not quite sure what you're getting at, here.  I think that use case is 
sufficiently broad to be able to make claims about the ease of 
implementing PEP 444 in a compatible way.


If you think it's easy, then I'm sure the authors of various 3rd-party 
libs would welcome your help achieving it.


I helped proof a book about Python 3 compatibility and am giving a 
presentation in March that contains information on Python 3 
compatibility from the viewpoint of implementing the Marrow suite.



Python 2.x will be around for a long time.


And so will PEP  and even PEP 333. People who value legacy 
compatibility will favour these old PEPs over your new one anyway. 
People who don't will progressively jump to 3.x.


Yup.  Not sure how this is really an issue.  PEP 444 is the /future/, 
333[3] is /now/ [-ish].


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 09:04:07 -0800, Antoine Pitrou said:
WSGI doesn't mandate any specific feature of generators, such as 
coroutine-like semantics, and the server doesn't have to know about 
them.


The joy of writing a new specification is that we are not (potentially) 
shackled by old ways of doing things.  Case in point: dropping 
start_response and changing the return value.  PEP 444 isn't WSGI 1, 
and can change things, including additional changes to the allowable 
return value.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 Goals

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 20:34:09 -0800, P.J. Eby said:
That it [handling generators] is difficult at all means removes 
degree-of-difficulty as a strong motivation to switch.


Agreed.  I will be following up with a more concrete idea (including 
p-code) to better describe what is currently in my brain.  (One half of 
which will be just as objectionable, the other half, with Alex 
Grönholm's input, far more reasonable.)


IOW, there are six specific facts someone needs to remember in orderto 
know the type of a given CGI variable, over and above the merefact that 
it's a CGI variable.  Hence, reference.


No, practically there is one.  If you are implementing a Python 3 
solution, a single value (original URI) is an instance of bytes, the 
rest are str.  If you are implementing a Python 2 solution, there's a 
single rule you need to remember: values derived from the URI 
(QUERY_STRING, PATH_INFO, etc.) are unicode, the rest are str.


Poloygot implementors are already accepting that they will need to 
include more in their headspace before writing a single line of code; 
knowing that native string differs between the two langauges is a 
fundamental concept nessicary for the act of writing polygot code.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 22:13:17 -0800, Alex Grönholm said:

08.01.2011 07:09, P.J. Eby wrote:
On the plus side, the run this in a future after the request concept 
has some legs... [snip]


What exactly does run this in a future after the request mean? There 
seems to be some terminology confusion here.


I suspect he's referring to some of the notes on the PEP 444 feature 
request - Futures executor thread and several of my illustrated use 
cases, notably:


:: Image scaling (e.g. to multiple sizes) after uploading of an image 
to be scaled where the response (Congratulations, image uploded!) does 
not require the result of the scaling.


:: Content indexing which can also be performed after returning the 
success page.


The former would executor.submit() a number of scaling jobs, attach 
completion callbacks to perform some cleanup / database updating / 
etc., and return a response immediately.  The latter is a single 
executor submission that is entirely non-time-critical.


And likely other use cases as well.  This (inclusion of an executor 
tuned to the underlying server in the environment) is one thing I think 
we can (almost) all agree is a good idea.  :D  Discussion on that 
particular idea should be relegated to the feature request thread, 
though.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-07 Thread Alice BevanMcGregor

On 2011-01-07 13:21:36 -0800, Antoine Pitrou said:

Ok, so, WSGI doesn't already involve generators. QED.


Let me try this again.  With the understanding that:

:: PEP 333[3] and 444 define a response body as an iterable.
:: Thus WSGI involves iterables through definition.
:: A generator is a type of iterable.
:: Thus WSGI involves generators through the use of iterables.

The hypothetical redefinition of an application as a generator is not 
too far out to lunch, considering that WSGI _already involves 
generators_.  (And that the simple case, an application that does not 
utilize async, will require a single word be changed: s/return/yield)


Is that clearer?  The idea refered to below (and posted separately) 
involve this redefinition, which I understand fully will have a number 
of strong opponents.  Considering PEP 444 is a new spec (already 
breaking direct compatibility via the /already/ redefined return value) 
I hope people do not reject this out of hand but instead help explore 
the idea further.


On 2011-01-07 19:36:52 -0800, Antoine Pitrou said:

Alice Bevan–McGregor al...@... writes:
The particular use case happens to be PEP 444 as implemented using an 
async and multi-process (some day multi-threaded) HTTP server, so I'm 
not quite sure what you're getting at, here.


It's becoming to difficult to parse. You aren't sure yet what the async 
part of PEP 444 should look like but you have already implemented it?


Marrow HTTPd (marrow.server.http) [1] is, internally, an asynchronous 
server.  It does not currently expose the reactor to the WSGI 
application via any interface whatsoever.  I am, however, working on 
some p-code examples (that I will post for discussion as mentioned 
above) which I can base a fork of m.s.http off of to experiment.


This means that, yes, I'm not sure how async will work in PEP 444 /in 
the end/, but I am at least attempting to explore the practical 
implications of the ideas thus far in a real codebase.  I'm getting it 
done, even if it has to change or be scrapped.


I helped proof a book about Python 3 compatibility and am giving a 
presentation in March that contains information on Python 3 
compatibility from the viewpoint of implementing the Marrow suite.


Well, I hope not too many people will waste time trying to write code 
cross-compatible code rather than solely target Python 3. The whole 
point of Python 3 is to make developers' life better, not worse.


I agree, with one correction to your first point.  Application and 
framework developers should whole-heartedly embrase Python 3 and make 
full use of its many features, simplifications and clarifications.  
However, it is demonstrably not Insanely Difficult™ to have compatible 
server and middleware implementations with the draft's definition of 
native string.  If server and middleware developers are willing to 
create polygot code, I'm not going to stop them.


Note that this type of compatibility is not mandated, and the use of 
native strings (with one well defined byte string exception) means that 
pure Python 3 programmers can be blissfully ignorant of the 
compatibility implications -- everything else is unicode (str), even 
if it's just bytes-in-unicode (latin1/iso-8859-1).  Pure Python 2 
programmers have only a small difference (for them) of the URI values 
being unicode; the remaining values are byte strings (str).


I would like to hear a technical reason why this (native strings) is a 
bad idea instead of vague this will make things harder -- it won't, 
at least, not measurably, and I have the proof as a working, 100% unit 
tested, performant, cross-compatible polygot HTTP/1.1-compliant server. 
Written in several days worth of full-time work spread across weeks 
because this is a spare-time project; i.e. not a lot of literal work, 
nor hard.


Hell, it has transformed from a crappy hack to experiment with HTTP 
into a complete (or very nearly so) implementation of PEP 444 in both 
of its current forms (published and draft) that is almost usable, 
ignoring the fact that PEP 444 is mutable, of course.


- Alice.

[1] http://bit.ly/fLfamO


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 03:53:14 -0800, Antoine Pitrou said:

Alice Bevan-€“McGregor al...@... writes:
GothAlice: ... native string usage, the definition of byte string as 
the format returned by socket read (which, on Java, is unicode!) ...


Just so no-one feels the need to correct me; agronholm made sure I 
didn't drink the kool-aid of one article I was reading and basing some 
ideas on.  Java socket ojects us byte-based buffers, not unicode.  My 
bad!



Regardless of the rest, I think the latter would be a large step backwards.
Clear distinction between bytes and unicode is a *feature* of Python 3.
Unicode-ignorant programmers should use frameworks which do the encoding work
for them.


+0.5

I'm beginning to agree; with the advent of b'' syntax in 2.6, the only 
compelling reason to include this feature (examples that work without 
modification across major versions of Python) goes up in smoke.  The 
examples should use the b'' syntax and have done with it.



(by the way, why you are targeting both Python 2 and 3?)


For the same reason that Python 3 features are introduced to 2.x; 
migration.  Users are more likely to adopt something that doesn't 
require them to change production environments, and 3.x is far away 
from being deployed in production anywhere but on Gentoo, it seems.  ;)


Broad development and deployment options are a Good Thing™, and with 
b'', there is no reason -not- to target 2.6+.  (There is no requirement 
that a PEP 444 / WSGI 2 server even try to be a cross-compatible 
polygot; there is room for 2.x-specific and 3.x-specific solutions, 
and, in theory, it should be possible to support Python  2.6, I just 
don't feel it's worthwhile to lock your application into Very Old™ 
interpreters.)



agronholm: I'm not very comfortable with the idea of wsgi.input in
async apps \ I'm just thinking what would happen when you do
environ['wsgi.input'].read()

GothAlice: One of two things: in a sync environment, it blocks until it
can read, in an async environment [combined with yield] it
pauses/shelves your application until the data is available.


Er, for the record, in Python 3 non-blocking file objects return None when
read() would block.


-1

I'm aware, however that's not practically useful.  How would you detect 
from within the WSGI 2 application that the file object has become 
readable?  Implement your own async reactor / select / epoll loop?  
That's crazy talk!  ;)



agronholm: the requirements of async apps are a big problem

agronholm: returning magic values from the app sounds like a bad idea

agronholm: the best solution I can come up with is to have
wsgi.async_input or something, which returns an async token for any
given read operation


The idiomatic abstraction for non-blockingness under POSIX is file descriptors.
So, at the low level (the WSGI level), exchanging fds between server and app
could be enough to allow both to wake up each other (perhaps two fds: one the
server can wait on, one the app can wait on). Similarly to what 
signalfd() does.

Then higher-level tools can wrap inside Futures or whatever else.


-0

Hmm; I'll have to mull that over.  Initial thoughts: having a magic 
yield value that combines a fd and operation (read/write) is too 
magical.



However, this also means Windows compatibility becomes more complicated, unless
the fds are sockets.


+1 for pure futures which (in theory) eliminate the need for dedicated 
async versions of absolutely everything at the possible cost of 
slightly higher overhead.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-06 Thread Alice BevanMcGregor

Chris,

On 2011-01-06 05:03:15 -0800, Chris Dent said:

On Wed, 5 Jan 2011, Alice Bevan–McGregor wrote:
This should give a fairly comprehensive explanation of the rationale 
behind  some decisions in the rewrite; a version of these 
conversations (in narrative  style vs. discussion) will be added to 
the rewrite Real Soon Now™ under the  Rationale section.


Thanks for this. I've been trying to follow along with this 
conversation as an interested WSGI app developer and admit that much of 
the thrust of things is getting lost in the details and people's 
tendency to overquote.


Yeah; I knew the IRC log dump was only so useful.  It's a lot of 
material to go through, and much of it was discussed at strange hours 
with little sleep.  ;)


One thing that would be useful is if, when you post, Alice, you could 
give the URL of whatever and wherever your current draft is.


Tomorrow (ack, today!) I'll finish converting over the PEP from Textile 
to ReStructuredText and get it re-submitted to the Python website.


https://github.com/GothAlice/wsgi2/blob/master/pep444.textile
http://www.python.org/dev/peps/pep-0444/

I don't use frameworks, or webob or any of that stuff. I just cook up 
callables that take environ and start_response. I don't want my 
awareness of the basics of HTTP abstracted away, because I want to make

sure that my apps behave well.


Kudos!  That approach is heavily frowned upon in the #python IRC 
channel, but I fully agree that working solutions can be reasonably 
made using that methedology.  There are some details that are made 
easier by frameworks, though.  Testing benefits from MVC: you can test 
the dict return value of the controller, the templates, and the model 
all separately.


Plain WSGI is a good thing, for me, because it means that my 
applications are a) very webby (in the stateless HTTP sense) and b) 
very testable.


c) And very portable.  You need not depend on some pre-arranged stack 
(including web server).


I agree with some others who have suggested that maybe async should be 
its own thing, rather than integrated into a WSGI2. A server could 
choose to be WSGI2 compliant or AWSGI compliant, or both.


-1

That is already the case with filters, and will be when I ratify the 
async idea (after further discussion here).  My current thought process 
is that async will be optional for server implementors and will be 
easily detectable by applications and middleware and have zero impact 
on middleware/applications if disabled (by configuration) or missing.


That said I can understand why an app author might like to be able to 
read or write in an async way, and being able to shelf an app to wait 
around for the next cycle would be a good thing.


Using futures, async covers any callable at all; you can queue up a 
dozen DB calls at the top of your application, then (within a body 
generator) yield those futures to be paused pending the data.  That 
would, as an example, allow complex pages to be generated and streamed 
to the end-user in a efficient way -- the user would see a page begin 
to appear, and the browser downloading static resources, while 
intensive tasks complete.


I just don't want efforts to make that possible to make writing a 
boring wsgi thing more

annoying.


+9001

See above.

I can't get my head around filters yet. They sound like a different way 
to do middleware, with a justification of something along the lines of 
I don't like middleware for filtering. I'd like to be (directly) 
pointed at a more robust justification. I suspect you have already 
pointed at such a thing, but it is lost in the sands of time...


Filters offer several benefits, some of which are mild:

:: Simplified application / middleware debugging via smaller stack.
:: Clearly defined tasks; ingress = altering the environ / input, 
egress = altering the output.

:: Egress filters are not executed if an unhandled exception is raised.

The latter point is important; you do not want badly written middleware 
to absorb exceptions that should bubble, etc.  (I'll need to elaborate 
on this and add a few more points when I get some sleep.)


Filters seem like something that could be added via a standardized 
piece of middleware, rather than being part of the spec. I like minimal 
specs.


Filters are optional, and an example is/will be provided for utilizing 
ingress/egress filter stacks as middleware.


The problem with /not/ including the filtering API (which, by itself is 
stupidly simple and would barely warrant its own PEP, IMHO) is that a 
separate standard would not be seen and taken into consideration when 
developers are writing what they will think /must/ be middleware.  
Seing as a middleware version of a filter is trivial to create (just 
execute the filter in a thin middleware wrapper), it should be a 
consideration up front.



Latin1 = \u → \u00FF [snip]


There's a rule of thumb about constraints. If you must constrain, do 
none, one or all, never

Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-06 Thread Alice BevanMcGregor
On 2011-01-06 09:06:10 -0800, 
chris.d...@gmail.com said:
I wasn't actually talking about the log dump. That was useful. What I 
was talking about were earlier messages in the thread where people were 
making responses, quoting vast swaths of text for no clear reason.


Ah.  :)  I do make an effort to trim quoted text to only the relevant parts.


On Thu, 6 Jan 2011, Alice Bevan–McGregor wrote:

https://github.com/GothAlice/wsgi2/blob/master/pep444.textile


Thanks, watching that now.


The textile document will no longer be updated; the pep-444.rst 
document is where it'll be at.


I should have been more explicit here as I now feel I must defend 
myself from frowns. I'm not talking about single methods that do the 
entire app. I nest a series of middleware that bottom out at Selector 
which then does url based dispatch to applications, which themselves 
are defined as handlers (simple wsgi functions) and access 
StorageInterfaces and Serializations. The middleware, handlers, stores 
and serializers are all independently testable (and usable).


*nods* My framework (WebCore) is basically a packaged up version of a 
custom middleware stack so I can easily re-use it from project to 
project.  I assumed (in my head) you were rolling your own 
framework/stack.


That is already the case with filters, and will be when I ratify the 
async idea (after further discussion here).  My current thought process 
is that async will be optional for server implementors and will be 
easily detectable by applications and middleware and have zero impact 
on middleware/applications if disabled (by configuration) or missing.


This notion of being detectable seems weird to me. Are we actually 
expecting an application to query the server, find out it is not async 
capable, and choose a different code path as a result? Seems much more
likely that the installer will choose a server or app that meets their 
needs. That is: you don't need to detect, you need to know (presumably 
at install/config time).


Or maybe I am imagining the use cases incorrectly here. I think of app 
being async as an explicit choice made by the builder to achieve some 
goal.


More to the point it needs to be detectable by middleware without 
explicitly configuring every layer of middleware, potentially with 
differing configuration mechanics and semantics.  (I.e. arguments like 
enable_async, async_enable, iLoveAsync, ...)



I can't get my head around filters yet.[snip]


Filters offer several benefits, some of which are mild:

:: Simplified application / middleware debugging via smaller stack.
:: Clearly defined tasks; ingress = altering the environ / input, 
egress =  altering the output.

:: Egress filters are not executed if an unhandled exception is raised.


Taken individually none of these seem super critical to me.

Or to put it another way: Yeah, so?

(This is the aforementioned resistance showing through. The above 
sounds perfectly nice, reasonable and desireable, but not _necessary_.)


It isn't necessary; it is, however, an often re-implemented feature of 
a framework on top of WSGI.  CherryPy, Paste, Django, etc. all 
implement some form of non-WSGI (or, hell, Paste uses WSGI middleware) 
thing they call a 'filter'.


Filters are optional, and an example is/will be provided for utilizing 
 ingress/egress filter stacks as middleware.


In a conversation with some people about the Atom Publishing Protocol I 
tried to convince them that the terms SHOULD and MAY had no place in a 
spec. WSGI* is not really the same kind of spec, but optionality

still grates in the same way.


I fully agree; that's why a lot of the PEP 333 optionally or may 
features have become must.  Optionally and may simply never get 
implemented.


Filters are optional because a number of people have raised valid 
arguments that it might not be entirely needed.  Thus, it's not 
required.  But I strongly feel that some defined API should be present 
in (or /at least/ referred to by) the PEP, otherwise the future will 
hold the same server-specific incompatible implementations.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 Goals

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 13:06:36 -0800, James Y Knight said:


On Jan 6, 2011, at 3:52 PM, Alice Bevan–McGregor wrote:
:: Making optional (and thus rarely-implemented) features non-optional. 
E.g. server support for HTTP/1.1 with clarifications for interfacing 
applications to 1.1 servers.  Thus pipelining, chunked encoding, et. 
al. as per the HTTP 1.1 RFC.


Requirements on the HTTP compliance of the server don't really have any 
place in the WSGI spec. You should be able to be WSGI compliant even if 
you don't use the HTTP transport at all (e.g. maybe you just send 
around requests via SCGI).
The original spec got this right: chunking etc are something which is 
not relevant to the wsgi application code -- it is up to the server to 
implement the HTTP transport according to the HTTP spec, if it's 
purporting to be an HTTP server.


Chunking is actually quite relevant to the specification, as WSGI and 
PEP 444 / WSGI 2 (damn, that's getting tedious to keep dual-typing ;) 
allow for chunked bodies regardless of higher-level support for 
chunking.  The body iterator.  Previously you /had/ to define a length, 
with chunked encoding at the server level, you don't.


I agree, however, that not all gateways will be able to implement the 
relevant HTTP/1.1 features.  FastCGI does, SCGI after a quick Google 
search, seems to support it as well. I should re-word it as:


For those servers capable of HTTP/1.1 features the implementation of 
such features is required.


+1

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] WSGI Middleware Dependancy Graphing (was: PEP 444 / WSGI 2 Async)

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 13:08:04 -0800, Robert Brewer said:


Or, if you had actually read what I wrote weeks ago...


I did.  Apologies for forgetting the detail of the implementation being 
deprecated.


We don't need Yet Another Way of hooking in processing components; if 
anything, we need a standard mechanism to compose existing middleware 
graphs so that invariant orderings are explicit and guaranteed. For 
example, encode, then gzip, then cache. By introducing egress filters 
as described in PEP 444 (which mentions gzip as a candidate for an 
egress filter), you're then stuck in a tug-of-war as to whether to 
build a new caching component as middleware, as an egress filter, or 
(most likely, in order to compete) both.


I do, in fact, have a proposal for declaring dependancies, however such 
declaration is utterly useless unless differing middleware-based 
implementations (e.g. sessions) can agree on a common API for their 
feature sets.  I feel strongly that this idea does not belong in PEP 
444; it's one of the few things I think should be its own PEP.


My mechanism (for which I do have a working implementation against WSGI 
1; my web framework uses it) involves middleware layers declaring 
several attributes on themselves:


provides - abstract API names
uses - ordering hint, no dependancy
needs - die if dependancy is not met
before - explicit ordering, including *
after - explicit ordering, including *

For this to really work, however, it'd also need either an 
entrypoint-based way of looking up components (making the graph truly 
dynamic), or it needs to be combined with explicit packages a la 
setuptools.require.  In that instance, you've already done the ordering 
yourself, so dependancy graphing is moot.


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 Goals

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 14:14:32 -0800, Alice Bevan–McGregor said:
There was something, somewhere I was reading related to WSGI about 
requiring content-length... but no matter.


Right, I remember now: the HTTP 1.0 specification.  (Honestly not 
trying to sound sarcastic!)  See:



http://www.w3.org/Protocols/HTTP/1.0/draft-ietf-http-spec.html#Entity-Body

However, after testing every browser on my system (from Links and 
ELinks, through Firefox, Chrome, Safari, Konqueror, and Dillo) across 
the following test code, I find that they all handle a missing 
content-length in the same way: reading the socket until it closes.


http://pastie.textmate.org/1435415

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 Goals

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 21:26:32 -0800, James Y Knight said:
You've misread that section. In HTTP/1.0, *requests* were required to 
have a Content-Length if they had a body (HTTP 1.1 fixed that with 
chunked request support). Responses have never had that restriction: 
they have always (even since before HTTP 1.0) been allowed to omit 
Content-Length and terminate by closing the socket.


Ah ha, that explains my confusion, then! Thank you.

- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Declaring PEP 3333 accepted (was: PEP 444 != WSGI 2.0)

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 22:00:17 -0800, Graham Dumpleton said:

-environ = {k: wsgi_string(v) for k,v in os.environ.items()}
+environ = {k: wsgi_string(v) for k,v in list(os.environ.items())}


2to3 takes the conservative route of assuming your application treats 
dict.items() as a list in all cases; this is not nessicarily true (of 
course), but it is safe, and interestingly, backwards compatible.



-raise exc_info[0], exc_info[1], exc_info[2]
+raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])


The exception raising syntax has changed; you can not re-raise an 
exception using tuple notation any more.  The new syntax is far 
clearer, but I'm unsure of back-compatibility or even if it is possible 
to emulate it completely as a polygot (2.x and 3.x w/ same code).


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3 / PEP 3333 (was: PEP 444 / WSGI 2 Async)

2011-01-06 Thread Alice BevanMcGregor

On 2011-01-06 23:40:53 -0800, Graham Dumpleton said:

There is also uWSGI and CherryPy WSGI server. I recollect that Benoit 
may have started looking it over for gunicorn.


Ah, right, I recall seeing CherryPy mentioned in archived discussions.  
So there's hope, then, for relatively quick adoption once ratified.  :)


- Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-05 Thread Alice BevanMcGregor
[Apologies if this is a double- or triple-post; I seem to be having a 
stupid number of connectivity problems today.]


Howdy!

Apologies for the delay in responding, it’s been a hectic start to the 
new year.  :)


On 2011-01-03, at 6:22 AM, Timothy Farrell wrote:

You don't know me but I'm the author of the Rocket Web Server 
(http://pypi.python.org/pypi/rocket) and have, in the past, been 
involved in the web2py community.  Like you, I'm interested in seeing 
web development come to Python3.  I'm glad you're taking up WSGI2.  I 
have a feature-request for it that perhaps we could work in.


Of course; in fact, I hope you don’t mind that I’ve re-posted this 
response to the web-sig mailing list.  Async needs significantly 
broader discussion.  I would appreciate it if you could reply to the 
mailing list thread.


I would like to see futures added as a server option.  This way, 
controllers could dispatch emails (or run some other blocking or 
long-running task) that would not block the web-response.  WSGI2 
Servers could provide a futures executor as environ['wsgi.executor'] 
that the app could use to offload processes that need not complete 
before the web-request is served to the client.


E-mail dispatch is one of the things I solved a long time ago with 
TurboMail; it uses a dedicated thread pool and can deliver  100 unique 
messages per second (more if you use BCC) in the default configuration, 
so I don’t really see that one use case as one that can benefit from 
the futures module.  Updating TurboMail to use futures would be an 
interesting exercise.  ;)


I was thinking of exposing the executor as 
environ[‘wsgi.async.executor’], with ‘wsgi.async’ being a boolean value 
indicating support.



What should the server do with the future instances?


The executor returns future instances when running executor.submit/map; 
the application never generates its own Future instances.  The 
application may, however, use whatever executor it sees fit; it can, 
for example, have one thread pool executor and one process pool, used 
for different tasks.


The server itself can utilize any combination of single-threaded 
IO-based async (see further on in this message), and multi-threaded or 
multi-process management of WSGI requests.  Resuming suspended 
applications (ones pending future results) is an implementation detail 
of the server.


Should future.add_done_callback() be allowed?  I'm not sure how 
practical/reliable this would be. (By the time the callback is called, 
the calling environment could be gone.  Is this undefined behavior?)


If you wrap your callback in a partial(my_callback, environ) the 
environ will survive the end of the request/response cycle (due to the 
incremented reference count), and should be allowed to enable 
intelligent behaviour in the callbacks.  (Obviously the callbacks will 
not be able to deliver a response to the client at the time they are 
called; the body iterator can, however, wait for the future instance to 
complete and/or timeout.)


A little bit later in this message I describe a better solution than 
the application registering its own callbacks.


Do we need to also specify what type of executor is provided (threaded 
vs. separate process)?


I think that’s an application-specific configuration issue, not really 
the concern of the PEP.



Do you have any thoughts about this?


I believe that intelligent servers need some way to ‘pause’ a WSGI 
worker rather than relying on the worker executing in a thread and 
blocking while waiting for the return value of a future.  Using 
generator syntax (yield) with the following rules is my initial idea:


* The application may yield None.  This is a polite way to have the 
async reactor (in the WSGI server/gateway) reschedule the worker for 
the next reactor cycle.  Useful as a hint that “I’m about do do 
something that may take a moment”, allowing other workers to get a 
chance to perform work. (Cooperative multi-tasking on single-threaded 
async servers.)


* The application must yield one 3-tuple WSGI response, and must not 
yield additional data afterwords.  This is usually the last thing the 
WSGI application would do, with possible cleanup code afterwords 
(before falling off the bottom / raising StopIteration / returning 
None).


* The application may yield Future instances returned by 
environ[‘wsgi.executor’].submit/map; the worker will then be paused 
pending execution of the future; the return value of the future will be 
returned from the yield statement.  Exceptions raised by the future 
will be re-raised from the yield statement and can thus be captured in 
a natural way.  E.g.:


try:
    complex_value = yield environ[‘wsgi.executor’].submit(long_running)
except:
    pass # handle exceptions generated from within long_running

Similar rules apply to the response body iterator: it yields 
bytestrings, may yield unicode strings where native strings are unicode 
strings, and 

Re: [Web-SIG] PEP 444 / WSGI 2 Async

2011-01-05 Thread Alice BevanMcGregor
Alex Grönholm and I have been discussing async implementation details 
(and other areas of PEP 444) for some time on IRC.  Below is the 
cleaned up log transcriptions with additional notes where needed.


Note: The logs are in mixed chronological order — discussion of one 
topic is chronological, potentially spread across days, but separate 
topics may jump around a bit in time.  Because of this I have 
eliminated the timestamps as they add nothing to the discussion.  
Dialogue in square brackets indicates text added after-the-fact for 
clarity.  Topics are separated by three hyphens.  Backslashes indicate 
joined lines.


This should give a fairly comprehensive explanation of the rationale 
behind some decisions in the rewrite; a version of these conversations 
(in narrative style vs. discussion) will be added to the rewrite Real 
Soon Now™ under the Rationale section.


— Alice.


--- General

agronholm: my greatest fear is that a standard is adopted that does not 
solve existing problems


GothAlice: [Are] there any guarantees as to which thread / process a 
callback [from the future instance] will be executed in?




--- 444 vs. 

agronholm: what new features does pep 444 propose to add to pep ? \ 
async, filters, no buffering?


GothAlice: Async, filters, no server-level buffering, native string 
usage, the definition of byte string as the format returned by 
socket read (which, on Java, is unicode!), and the allowance for 
returned data to be Latin1 Unicode. \ All of this together will allow a 
'''def hello(environ): return 200 OK, [], [Hello world!]''' example 
application to work across Python versions without modification (or use 
of b prefix)


agronholm: why the special casing for latin1 btw? is that an http thing?

GothAlice: Latin1 = \u → \u00FF — it's one of the only formats that 
can be decoded while preserving raw bytes, and if another encoding is 
needed, transcode safely. \ Effectively requiring Latin1 for unicode 
output ensures single byte conformance on the data. \ If an application 
needs to return UTF-8, for example, it can return an encoded UTF-8 
bytestream, which will be passed right through,




--- Filters

agronholm: regarding middleware, you did have a point there -- 
exception handling would be pretty difficult with ingress/egress filters


GothAlice: Yup.  It's pretty much a do or die scenario in filter-land.

agronholm: but if we're not ditching middleware, I wonder about the 
overall benefits of filtering \ it surely complicates the scenario so 
it'd better be worth it \ I don't so much agree with your reasoning 
that [middleware] complicates debugging \ I don't see any obvious 
performance improvements either (over middleware)


GothAlice: Simplified debugging of your application w/ reduced stack to 
sort through, reduced nested stack overhead (memory allocation 
improvement), clearer separation of tasks (egress compression is a good 
example).  This follows several of the Zen of Python guidelines: \ 
Simple is better than complex. \ Flat is better than nested. \ There 
should be one-- and preferably only one --obvious way to do it. \ If 
the implementation is hard to explain, it's a bad idea. \ If the 
implementation is easy to explain, it may be a good idea.


agronholm: I would think that whatever memory the stack elements 
consume is peanuts compared to the rest of the application \ 
ingress/egress isn't exactly simpler than middleware


GothAlice: The implementation for ingress/egress filters is two lines 
each: a for loop and a call to the elements iterated over.  Can't get 
much simpler or easier to explain.  ;) \ Middleware is pretty complex… 
\ The majority of ingress filters won't have to examine wsgi.input, and 
supporting async on egress would be relatively easy for the filters 
(pass-through non-bytes data in body_iter). \ If you look at a system 
that offers input filtering, output filtering, and decorators 
(middleware), modifying input should obviously be an input filter, 
and vice-versa.


agronholm: how does a server invoke the ingress filters \ in my 
opinion, both ingress and egress filters should essentially be pipes \ 
compression filters are a good example of this \ once a block of 
request data (body) comes through from the client, it should be sent 
through the filter chain


agronholm: consider an application that receives a huge gzip encoded 
upload \ the decompression filter decompresses as much as it can using 
the incoming data \ the application only gets the next block once the 
decompression filter has enough raw data to decompress


GothAlice: Ingress decompression, for example, would accept the environ 
argument, detect gzip content-encoding, then decompress the wsgi.input 
into its own buffer, and finally replace wsgi.input in the environ with 
its decompressed version. \ Alternatively, it could decompress chunks 
and have a more intelligent replacement for wsgi.input (to delay 
decompression until it is needed).



[Web-SIG] PEP 444 Draft Rewrite

2010-12-24 Thread Alice BevanMcGregor

Howdy!

I've mostly finished a draft rewrite of PEP 444 (WSGI 2), incorporating 
some additional ideas covering things like py2k/py3k interoperability 
and switching from a more narrative style to a substantially 
RFC-inspired language.


http://bit.ly/e7rtI6

I'm using Textile as my intermediary format, and will obviously need to 
convert this to ReStructuredText when I'm done.  Missing are:


* The majority of the examples.
* Narrative rationale, wich I'll be writing shortly.
* Narrative Python compatibility documentation.
* Asynchronous documentation.  This will likely rely on the abstract 
API defined in PEP 3148 (futures) as implemented in Python 3.2 and the 
futures package available on PyPi.
* Additional and complete references.  The Rationale chapter will add 
many references to community discussion.


I would appreciate it greatly if this rewrite could be read through and 
questions, corrections, or even references to possible ambiguity 
mentioned in discussion.


Have a happy holidays and a merry new-year, everybody!  :)

- Alice.

P.s. I'll be updating my PEP 444 reference implementation HTTP 1.1 
server (marrow.server.http) over the holidays to incorporate the 
changes in this rewrite; most notably the separation of byte strings, 
unicode strings, and native strings.



___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

2010-12-13 Thread Alice BevanMcGregor
That looks amazingly like the code for CherryPy Filters circa 2005. In 
version 2 of CherryPy, Filters were the canonical extension method 
(for the framework, not WSGI, but the same lessons apply). It was still 
expensive in terms of stack allocation overhead, because you had to 
call () each filter to see if it was on. It would be much better to 
find a way to write something like:




for f in ingress_filters:

if f.on:

f(environ)


.on will need to be an @property in most cases, still not avoiding 
stack allocation and, in fact, doubling the overhead per filter.  
Statically disabled filters should not be added to the filter list.


It was also fiendishly difficult to get executed in the right order: if 
you had a filter that was both ingress and egress, the natural tendency 
for core developers and users alike was to append each to each list, 
but this is almost never the correct order.


If something is both an ingress and egress filter, it should be 
implemented as middleware instead.  Nothing can prevent developers from 
doing bad things if they really try.  Appending to ingress and 
prepending to egress would be the right thing to simulate middleware 
behaviour with filters, but again, don't do that.  ;)


But even if you solve the issue of static composition, there's still a 
demand for programmatic composition (if X then add Y after it), and 
even decomposition (find the caching filter my framework added 
automatically and turn it off), and list.insert()/remove() isn't 
stellar at that.


I have plans (and partial implementation) of a init.d-style 
needs/uses/provides declaration and automatic dependency graphing.  
WebCore, for example, adds the declarations to existing middleware 
layers to sort the middleware.


Calling the filter to ask it whether it is on also leads filter 
developers down the wrong path; you really don't want to have Filter A 
trying to figure out if some other, conflicting Filter B has already 
run (or will run soon) that demands Filter A return without executing 
anything. You really, really want the set of filters to be both 
statically defined and statically analyzable.


Unfortunately, most, if not all filters need to check for request 
headers and response headers to determine the capability to run.  E.g. 
compression checks environ.get('HTTP_ACCEPT_ENCODING', '').lower() for 
'gzip', and checks the response to determine if a 'Content-Encoding' 
header has already been specified.


Finally, you want the execution of filters to be configurable per URI 
and also configurable per controller. So the above should be rewritten 
again to something like:




for f in ingress_filters(controller):

if f.on(environ['path_info']):

f(environ)



It was for these reasons that CherryPy 3 ditched its version 2 
filters and replaced them with hooks and tools in version 3.


This is possible by wrapping multiple applications, say, in the filter 
middleware adapter with differing filter setups, then using the 
separate wrapped applications with some form of dispatch.  You could 
also utilize filters as decorators.  This is an implementation detail 
left up to the framework utilizing WSGI2, however.  WSGI2 itself has no 
concept of controllers.


None of this prevents the simplified stack from being useful during 
exception handling, though.  ;)  What I was really trying to do is 
reduce the level of nesting on each request and make what used to be 
middleware more explicit in its purpose.



You might find more insight by studying the latest cherrypy/_cptools.py


I'll give it a gander, though I firmly believe filter management (as 
middleware stack management) is the domain of a framework on top of 
WSGI2, not the domain of the protocol.


— Alice.


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

2010-12-12 Thread Alice BevanMcGregor

Howdy!

There's one issue I've seen repeated a lot in working with WSGI1 and 
that is the use of middleware to process incoming data, but not 
outgoing, and vice-versa; middleware which filters the output in some 
way, but cares not about the input.


Wrapping middleware around an application is simple and effective, but 
costly in terms of stack allocation overhead; it also makes debugging a 
bit more of a nightmare as the stack trace can be quite deep.


My updated draft PEP 444[1] includes a section describing Filters, both 
ingress (input filtering) and egress (output filtering).  The API is 
trivially simple, optional (as filters can be easily adapted as 
middleware if the host server doesn't support filters) and easy to 
implement in a server.  (The Marrow HTTP/1.1 server implements them as 
two for loops.)


Basically an input filter accepts the environment dictionary and can 
mutate it.  Ingress filters take a single positional argument that is 
the environ.  The return value is ignored.  (This is questionable; it 
may sometimes be good to have ingress filters return responses.  Not 
sure about that, though.)


An egress filter accepts the status, headers, body tuple from the 
applciation and returns a status, headers, and body tuple of its own 
which then replaces the response.  An example implementation is:


for filter_ in ingress_filters:
filter_(environ)

response = application(environ)

for filter_ in egress_filters:
response = filter_(*response)

I'd love to get some input on this.  Questions, comments, criticisms, 
or better ideas are welcome!


— Alice.

[1] https://github.com/GothAlice/wsgi2/blob/master/pep-0444.rst


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444

2010-11-29 Thread Alice Bevan-McGregor
I’ve updated my copy of the PEP, re-naming non-commentary and non-revision text 
to reference WSGI2, wsgi2, or wsgi (environment variables) as appropriate.  
I’ve also added the first draft of the text describing filters and some sample 
code, including a middleware adapter for filters.  Here are some additional 
notes:

https://gist.github.com/719763 — filter vs. middleware
http://dirtsimple.org/2007/02/wsgi-middleware-considered-harmful.html

It might be worth another PEP to describe interfaces to common data to 
encourage interoperability between filters/middleware, such as GET/POST data, 
cookies, session data (likely using Beaker’s API as a base), etc.  Also 
something I’ve been exploring is automatic resolution of middleware/filter 
dependance by utilizing “uses”, “needs”, and “provides” properties on the 
callables and a middleware stack factory which can graph the dependancy tree.

On a side note, I do not appear to be receiving posts to this mailing list, 
only the out-of-list CC/BCCs.  :/  And here I’ve been getting used to reading 
and posting to comp.lang.python[.announce] on Usenet.  ;)

— Alice.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444

2010-11-22 Thread Alice Bevan-McGregor
Would you prefer to give me collaboration permissions on your repo, or should I 
fork it?

This message was sent from a mobile device. Please excuse any terseness and 
spelling or grammatical errors. If additional information is indicated it will 
be sent from a desktop computer as soon as possible. Thank you.

On 2010-11-21, at 11:40 PM, Chris McDonough chr...@plope.com wrote:

 Georg Brandl has thus far been updating the canonical PEP on python.org.
 I don't know how you get access to that.  My working copy is at
 https://github.com/mcdonc/web3 .
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444

2010-11-22 Thread Alice Bevan-McGregor
I’ve forked it, now available at:

https://github.com/GothAlice/wsgi2

Re-naming it to wsgi2 will be my first order of business during the week, 
altering your association the second.  I’ll post change descriptions for 
discussion as I go.

— Alice.

On 2010-11-22, at 12:12 AM, Chris McDonough wrote:

 Would you prefer to give me collaboration permissions on your repo, or
 should I fork it?
 
 Please fork it or create another repository entirely. I have no plans to
 do more work on it personally, so I don't think it should really be
 associated with me.  To that end, I think I'd prefer my name to either
 be off the PEP entirely or just listed as a helper or typist or
 something. ;-)

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] PEP 444

2010-11-22 Thread Alice Bevan-McGregor
On 2010-11-22, at 3:05 PM, Mark Ramm mark.mchristen...@gmail.com wrote:

 I would very much prefer it if we could keep the current name or choose a new 
 unrelated name, not wsgi2 as I think there API changes warrant a new name to 
 prevent confusion.

Web3, as mentioned in previous mailing list traffic, is a registered trademark. 
Python Web and WSGI are closely linked in the public mind-space. (Sleep 
deprived an can't think of a better way to phrase that.) Finally, I, and 
seemingly Python core, interpret major version number changes as breaking; py3k 
having backwards-incompatible syntax changes.

At a high level PEP 444 is /similar/ to WSGI in so far as the environ is a 
dict, and the returned values are a bytestring status, list of tuples for 
headers, and an iterable body. The inner implementation details seem a 
progressive enhancement and clarification of details which just happen to be 
backwards-incompatible.

Preserving the WSGI name has marketing benefits, refines existing understanding 
of the server/middleware/application semantics rather than implying something 
/completely/ new, and increasing the version to 2.0 clearly declares the 
backwards-incompatibility.

I think that Python 2 vs. 3 is a good comparison here; Python 3 has a different 
syntax and grammar, making it a fundamentally different language and is 
incompatible because of this. Why is it called Python and not Xyzzy?  #python 
wouldn’t have to have  ;)

Web frameworks have been encountering this problem for some time; TurboGerars 
developers, e.g., have been mulling over migrating to Pyramid or another 
top-level metaframework and debating strategies for migration: point everyone 
at something else, create something new, or keep the name and associated 
recognition?

Technically PEP 444 is incompatible, and wsgi.version = (2, 0) (and clear 
documentation) should indicate that.

   — Alice.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] PEP 444

2010-11-21 Thread Alice Bevan-McGregor
(A version of this is is available at http://web-core.org/2.0/pep-0444/ — links 
are links, code may be easier to read.)

PEP 444 is quite exciting to me.  So much so that I’ve been spending a few days 
writing a high-performance (C10K, 10Krsec) Py2.6+/3.1+ HTTP/1.1 server which 
implements much of the proposed standard.  The server is functional (less 
web3.input at the time of this writing), but differs from PEP 444 in several 
ways.  It also adds several features I feel should be part of the spec.

Source for the server is available on GitHub:

https://github.com/pulp/marrow.server.http

I have made several notes about the PEP 444 specification during implementation 
of the above, and concern over some implementation details:

First, async is poorly defined:

 If the origin server advertises that it has the web3.async capability, a Web3 
 application callable used by the server is permitted to return a callable 
 that accepts no arguments. When it does so, this callable is to be called 
 periodically by the origin server until it returns a non-None response, which 
 must be a normal Web3 response tuple.

Polling is not true async.  I believe that it should be up to the server to 
define how async is utilized, and that the specification should be clarified on 
this point.  (“Called periodically” is too vague.)  “Callable” should likely be 
redefined as “generator” (a callable that yields) as most applications require 
holding on to state and wrapping everything in functools.partial() is somewhat 
ugly.  Utilizing generators would improve support for existing Python async 
frameworks, and allow four modes of operation: yield None (no response, keep 
waiting), yield response_tuple (standard response), return / raise 
StopIteration (close the async connection) and allow for data to be passed back 
to the async callable by the higher-level async framework.

Second, WSGI middleware, while impressive in capability, are somewhat… 
heavy-weight.  Heavily nesting function calls is wasteful of CPU and RAM, 
especially if the middleware decides it can’t operate, for example, GZip 
compression disabling itself for non-text/ mimetypes.  The majority of WSGI 
middleware can, and probably should be, implemented as linear ingress or egress 
filters.  For example, on-disk static file serving could be an ingress filter, 
and GZip compression an egress filter.  m.s.http supports this filtering and 
demonstrates one API for such.  Also, I am in the process of writing an example 
egress CompressionFilter.

An example API and filter use implementation: (paraphrased from 
marrow.server.http)

 # No filters, near 0 overhead.
 for filter_ in ingress_filters:
 # Can mutate the environment.
 result = filter_(env)
 
 # Allow the filter to return a response rather than continuing.
 if result:
 # result is a status, headers, body_iter tuple
 return result[0], result[1], result[2]
 
 status, headers, body = application(env)
 
 for filter_ in egress_filters:
 # Can mutate the environment, status, headers, body, or
 # return completely new status, headers, and body.
 status, headers, body = filter_(env, status, headers, body)
 
 return status, headers, body

The environment has some minor issues.  I’ll write up my changes in RFC-style:

SERVER_NAME is REQUIRED and MUST contain the DNS name of the server OR virtual 
server name for the web server if available OR an empty bytestring if DNS 
resolution is unavailable.  SERVER_ADDR is REQUIRED and MUST contain the web 
server’s bound IP address.  URL reconstruction SHOULD use HTTP_HOST if 
available, SERVER_NAME if there is no HTTP_HOST, and fall back on SERVER_ADDR 
if SERVER_NAME is an empty bytestring.

CONTENTL_LENGTH is REQUIRED and MUST be None if not defined by the client.  
Testing explicitly for None is more efficient than armoring against missing 
values; also, explicit is better than implicit.  (Paste’s WSGI1 server defines 
CONTENT_LENGTH as 0, but this implies the client explicitly declared it as 
zero, which is not the case.)

FRAGMENT and PARAMETERS are REQUIRED and are parsed out of the URL in the same 
way as the QUERY_STRING. FRAGMENT is the text after a hash mark (a.k.a. 
“anchor” to browsers, e.g. /foo#bar). PARAMETERS come before QUERY_STRING, and 
after PATH_INFO separated by a semicolon, e.g. /foo;bar?baz.  Both values MUST 
be empty bytestrings if not present in the URL. (Rarely used — I’ve only seen 
it in Java and ColdFusion applications — but still useful.)

Points of contention:

Changing the namespace seems needless.  Using the wsgi.* namespace with a 
wsgi.version of (2, 0) will allow applications to easily armor themselves 
against incompatible use.  That’s what wsgi.version is for!  I’d add this as a 
strong “point of contention”.  m.s.http keeps the wsgi namespace and uses a 
version of (2, 0).

That’s it so far.  I may occasionally write in with additional ideas as I 
continue with my HTTP server 

Re: [Web-SIG] PEP 444

2010-11-21 Thread Alice Bevan-McGregor
 PEP 444 has no champion currently.  Both Armin and I have basically left it 
 behind.  It would be great if you wanted to be its champion.

Done.

As I already have a functional, performant HTTP server[1] and example filter[2] 
(compression) utilizing a slightly modified version of PEP 444, and hope to be 
giving a presentation on its design and related utilities[3] early next year, 
I’d love to have the opportunity to directly shape its future.  My server may 
be a bit large to be a reference implementation, but until it has its first 
user I have the benefit of being able to experiment whole-heartedly with 
features and proposals.

Since Python 3 was released I haven’t heard of much forward-progress in getting 
web frameworks compatible.  The largest complaint I’ve heard is that there are 
too few things already ported, which is a chicken and the egg problem.  This is 
one scenario where re-inventing the wheel may be the only way to see forward 
movement.  So far, I seem to be buckling down and Getting Things Done™ in this 
regard.

How would I go about getting access to the PEP in order to fix the issues I’ve 
been catching up on?  (I’ve been reading through quite a bit of old mailing 
list traffic these last few hours in-between writing docs and unit tests for 
the compression egress filter.)

Now I’m even more excited.  I’ll make a separate post to confirm and get some 
input on the issues I’ve encountered thus far.

— Alice.

[1] https://github.com/pulp/marrow.server.http
[2] https://github.com/pulp/marrow.wsgi.egress.compression — full documentation 
included
[3] http://web-core.org/marrow/confoo/ — input welcome; the deadline for 
modification is the 26th
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com