Re: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)?

2016-03-24 Thread Alan Kennedy
I don't see this relevant message in your references.

https://mail.python.org/pipermail/web-sig/2004-September/000749.html

Perhaps that, and following messages, might shed more light?

On Thu, Mar 24, 2016 at 3:18 PM, Jason Madden 
wrote:

> Hi all,
>
>
> Is there any practical reason that the type of the `environ` object must
> be exactly `dict`, as specified in PEP?
>
> I'm asking because it was recently pointed out that gevent's WSGI server
> can sometimes print `environ` (on certain error cases), but that can lead
> to sensitive information being kept in the server's logs (e.g.,
> HTTP_AUTHORIZATION, HTTP_COOKIE, maybe other things). The simplest and most
> flexible way to prevent this from happening, not just inadvertently within
> gevent itself but also for client applications, I thought, was to have
> `environ` be a subclass of `dict` with a customized `__repr__` (much like
> WebOb does for MultiDict, and repoze.who does for Identity, both for
> similar reasons).
>
> Unfortunately, when I implemented that in [0], I discovered that
> `wsgiref.validator` asserts that type(environ) is dict. I looked up the
> PEP, and sure enough, PEP  states that environ "must be a builtin
> Python dictionary (not a subclass, UserDict or other dictionary
> emulation)." [1]
>
> Background/History
> ==
>
> That seemed overly restrictive to me, so I tried to backtrack the history
> of that language in hopes of discovering the rationale.
>
> - It was present in the predecessor of PEP , PEP 0333, in the first
> version committed to the repository in August 2004. [2]
> - Prior to that, it was in both drafts of what would become PEP 0333
> posted to this mailing list, again from August 2004: [3], [4].
> - The ancestor of those drafts, the "Python Web Container Interface v1.0"
> was posted in December of 2003 with somewhat less restrictive language:
> "the environ object *must* be a Python dictionaryThe rationale for
> requiring a dictionary is to maximize portability
> between containers" [5].
>
> Now, the discussion on that earliest draft in [5] specifically brought up
> using other types that implement all the methods of a dictionary, like
> UserDict.DictMixin [6]. The last post on the subject in that thread seemed
> to be leaning towards accepting non-dict objects, at least if they were
> good enough [7].
>
> By the time the draft became recognizable as the precursor to PEP 0333 in
> [3], the very strict language we have now was in place. That draft,
> however, specifically stated that it was intended to be compatible with
> Python 1.5.2. In Python 1.5.2, it wasn't possible to subclass the builtin
> dict, so imitations, like UserDict.DictMixin, were necessarily imprecise.
> This was later changed to the much-maligned Python 2.2.2 release [8];
> Python 2.2 added the ability to subclass dict, but the language wasn't
> changed.
>
> Today
> =
>
> Given that today, we can subclass dict with full fidelity, is there still
> any practical reason not to be able to do so? I'm probably OK with gevent
> violating the letter of the spec in this regard, so long as there are no
> practical consequences. I was able to think of two possible objections, but
> both can be solved:
>
> - Pickling the custom `environ` type and then loading it in another
> process might not work if the class is not available. I can imagine this
> coming up with Celery, for example. This is easily fixed by adding an
> appropriate `__reduce_ex__` implementation.
>
> - Code somewhere relies on `if type(some_object) is dict:` (where
> `environ` became `some_object`, presumably through several levels of
> calls), instead of `isinstance(some_object, dict)` or
> `isinstance(some_object, collections.MutableMapping)`. The solution here is
> simply to not do that :) Pylint, among other linters, produces warnings if
> you do.
>
> Can anyone think of any other practical reasons I've overlooked? Is this
> just a horrible idea for other reasons?
>
> I appreciate any discussion!
>
> Thanks,
> Jason
>
> [0] https://github.com/gevent/gevent/compare/secure-environ
> [1] https://www.python.org/dev/peps/pep-/#specification-details
> [2]
> https://github.com/python/peps/commit/d5864f018f58a35fa787492e6763e382f98b923c#diff-ff370d50af3db062b015d1ef85935779
> [3] https://mail.python.org/pipermail/web-sig/2004-August/000518.html
> [4] https://mail.python.org/pipermail/web-sig/2004-August/000562.html
> [5] https://mail.python.org/pipermail/web-sig/2003-December/000394.html
> [7] https://mail.python.org/pipermail/web-sig/2003-December/000401.html
> [8] https://mail.python.org/pipermail/web-sig/2004-August/000565.html
>
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.py

Re: [Web-SIG] WSGI 2.0 Round 2: requirements and call for interest

2016-01-06 Thread Alan Kennedy
[Cory Benfield]
> Folks, just a reminder: RFC 2616 is dead. RFC 7230 says that *newly
defined* header
> fields should limit their field values to US-ASCII, but older header
fields are a
> crapshoot (though it notes that “in practice, most” header field values
use US-ASCII).
>
> Regardless, it seems to me that the correct method of communicating field
values would have been byte strings.

I think it's worth pointing out that the original intention of specifying
iso-8859-1 encoding was the request components *would* be presented to the
application as bytes.

WSGI was designed to work on python 2, where bytes and strings were stored
in the same datatype. In cpythons UCS-2 encoding, where every character
takes two bytes, only the lower byte would contain a value if the character
was from the iso-8859-1 character set. Moreover, encoding and decoding such
"byte strings" from iso-8859-1 would not change any values, i.e. iso-8859-1
was chosen because encoding and decoding from it was an identity transform.

The same considerations applied to Jython 2.x (which uses UTF-16) and
Ironpython 2.x (also UTF-16 I think), but which both had to the same
bytes/strings duality problem.

If python 2.x had had a bytes type, then that's what would have been used.

This would also have made more explicit that it is the applications job to
decode the bytes into whatever encoding it thinks is appropriate (i.e.
essentially what it has guessed, in the real world). The WSGI servers job
is to give the original bytes from the request to the WSGI application
*unchanged*.

The concluding message in the original discussion of encodings is here, if
anyone is interested.

https://mail.python.org/pipermail/web-sig/2004-September/000860.html

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] REMOTE_ADDR and proxys

2014-09-29 Thread Alan Kennedy
[Alan]
>> I disagreee. I think it is the role of the server/gateway to represent
the
>> actual incoming HTTP request as accurately as possible.

[Robert]
> So I agree with you

OK, so we agree :-)

[Robert]
> but in a multi-tier deployment architecture:

Then why disagree? ;-)

[Robert]
> Client -> LB -> Front-end-cache -> HTTPd ->WSGI -> application, which
> 'request' do app developers need represented? They want the client
> request, which is 3 network hops away: its entirely reasonable (and
> supported by RFC2616 and RFC7230 etc) for the internal structure of
> such a deployment to extend things in such a way that normal
> guarantees are suspended (e.g. caching, source addresses etc).

So what do you include and what do you exclude?

1. It's quite possible that the client is behind som kind of egress proxy
or firewall, which may or may not add a X-Forwarded-For header. Should this
be included?

2. What if your frontend LB is not configured to set an X-Forwarded-For
header? What if it is? What if there is differing configuration across
multiple LBs that are in your ingress path, and you get conflicting results
depending on what path the request came in?

3. What if there is a cache miss on your frontend cache? Will the caching
proxy add a header?

4. What if the proxy added a non-standard X-Forwarded-Ip header?
 - If it does, can you do reverse DNS lookup to find the host that it
reverses to?
 - If yes, in what DNS authority?

5. Is the order in which X-Forwarded-For headers guaranteed? Is it
trustworthy? Will every proxy in the chain declare itself?
 - Answers: no, no, and no.

Each of the above questions has multiple answers, each of which is arguably
valid, depending on your point of view.

The problem is that HTTP proxies are just too easy to write, and every
author of a proxy will make slightly different decisions on what should be
forwarded and what should not. Every configurable proxy can and will be
configured differently, according to the requirements of the folks
operating it.

http://proxies.xhaus.com

[Robert]
> which 'request' do app developers need represented?

The request that arrives into the origin server, exactly as it arrived,
unmodified. That way they can apply their own heuristics to processing the
request, knowing that it has not been interfered with.

> They want the client request, which is 3 network hops away

In your example, it's 3 hops away. I can easily paint you a thousand
different scenarios, each of which is a different number of hops away.

[Robert]
> So it sounds like it should be the responsibility of a middleware to
renormalize the environment?

In order for that to be the case, you have strictly define what
"normalization" means.

I believe that it is not possible to fully specify "normalization", and
that any attempt to do so is futile.

If you want to attempt it for the specific scenarios that your particular
application has to deal with, then by all means code your version of
"normalization" into your application. Or write some middleware to do it.

But trying to make "normalization" a part of a WSGI-style specification is
impossible.

Alan.


On Mon, Sep 29, 2014 at 10:14 PM, Collin Anderson 
wrote:

> Thanks guys. So it sounds like it should be the responsibility of a
> middleware to re normalize the environment?
>
> On Wed, Sep 24, 2014 at 4:51 PM, Robert Collins  > wrote:
>
>> On 25 September 2014 07:16, Alan Kennedy  wrote:
>> > [Collin]
>> >> It seems to me, it is the role of the server/gateway, not the
>> >> application/framework to determine the "correct" client ip address and
>> >> correctly account for the situation of being behind a known proxy.
>> >
>> > I disagreee. I think it is the role of the server/gateway to represent
>> the
>> > actual incoming HTTP request as accurately as possible.
>>
>> So I agree with you, but in a multi-tier deployment architecture:
>>
>> Client -> LB -> Front-end-cache -> HTTPd ->WSGI -> application, which
>> 'request' do app developers need represented? They want the client
>> request, which is 3 network hops away: its entirely reasonable (and
>> supported by RFC2616 and RFC7230 etc) for the internal structure of
>> such a deployment to extend things in such a way that normal
>> guarantees are suspended (e.g. caching, source addresses etc).
>>
>> > If the application knows about remote proxies and local reverse proxies,
>> > then it can take action accordingly.
>> >
>> > But the server should not attempt any magic: it is up to the
>> application to
>> > interpret the request in whatever way it sees fit.
>> ...
>> > If want to 

Re: [Web-SIG] REMOTE_ADDR and proxys

2014-09-24 Thread Alan Kennedy
[Collin]
> It seems to me, it is the role of the server/gateway, not the
> application/framework to determine the "correct" client ip address and
> correctly account for the situation of being behind a known proxy.

I disagreee. I think it is the role of the server/gateway to represent the
actual incoming HTTP request as accurately as possible.

If the application knows about remote proxies and local reverse proxies,
then it can take action accordingly.

But the server should not attempt any magic: it is up to the application to
interpret the request in whatever way it sees fit.

[Collin]
> Also, I am aware of the security issues of improperly handling
> X-Forwarded-For, but that's an issue no matter where it's being
> handled.

This is exactly why the server/gateway should refuse the temptation to
guess. It should leave it to the application to be smart enough to handle
all scenarios appropriately, knowing that it has access to the original
unmodified request.

If want to the magic rewriting functionality to be isolated from the
application, then it could easily be implemented as middleware.

Alan.


On Wed, Sep 10, 2014 at 7:41 PM, Collin Anderson 
wrote:

> Hi All,
>
> The CGI spec says:
>
> Script authors should be aware that the REMOTE_ADDR and REMOTE_HOST
> meta-variables (see sections 4.1.8 and 4.1.9) may not identify the
> ultimate source of the request.  They identify the client for the
> immediate request to the server; that client may be a proxy, gateway,
> or other intermediary acting on behalf of the actual source client.
>
> However, if the there is a revere proxy on the server side (such as
> nginx), it seems to me, the ip address of the "immediate request to
> the server" will be "127.0.0.1" and the actual address will be in an
> "X-Forwarded-For" header.
>
> It seems to me, it is the role of the server/gateway, not the
> application/framework to determine the "correct" client ip address and
> correctly account for the situation of being behind a known proxy.
>
> Also, I am aware of the security issues of improperly handling
> X-Forwarded-For, but that's an issue no matter where it's being
> handled.
>
> So, in the case of a reverse proxy, is it ok if the WSGI server sends
> back a REMOTE_ADDR that isn't 127.0.0.1, even if it's the immediate
> connection to the WSGI server is local?
>
> Basically can we interpret the "server" above to be the machine rather
> than the program?
>
> Thanks,
> Collin
> ___
> Web-SIG mailing list
> Web-SIG@python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com
>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Fwd: Can writing to stderr be evil for web apps?

2012-05-19 Thread Alan Kennedy
[anatoly]
> Martin expressed concerns that using logging module with stderr output
> can break web applications, such as PyPI.

Please can you specify exactly what you mean by "using logging module
with stderr output"?

Dealing with stderr is a webserver specific concern.

Consider the case where you're the author of a webserver that deals
with CGI scripts.

When you get a request for the CGI script, you start a subprocess to
run the script. You must decide what to do with the stdin, stdout and
stderr of the process.

 - CGI mandates that any content that came with the request (e.g. a
POST body) should be fed into stdin(if no other mechanism is in
place[0])
 - CGI mandates that the stdout of the process is sent back to the
client (if no other mechanism is in place[1]).
 - CGI makes no mention of stderr.

Various webservers permit configurable handling of stderr.

For example, Tomcat has a setting called "swallowOutput" which
redirects both stdout and stderr to a log file. (Obviously, Tomcat's
treatment of stdout is different for CGI)

http://tomcat.apache.org/tomcat-6.0-doc/config/context.html

WSGI has a specific mechanism for diagnostic output, wsgi.errors.

"""
wsgi.errors 

An output stream (file-like object) to which error output can be
written, for the purpose of recording program or other errors in a
standardized and possibly centralized location. This should be a "text
mode" stream; i.e., applications should use "\n" as a line ending, and
assume that it will be converted to the correct line ending by the
server/gateway.

...

For many servers, wsgi.errors will be the server's main error log.
Alternatively, this may be sys.stderr, or a log file of some sort. The
server's documentation should include an explanation of how to
configure this or where to find the recorded output. A server or
gateway may supply different error streams to different applications,
if this is desired.
"""

Lastly, note that WSGI supplies an example CGI gateway, about which it
has this to say about error handling

"""
Note that this simple example has limited error handling, because by
default an uncaught exception will be dumped to sys.stderr and logged
by the web server.
"""

http://www.python.org/dev/peps/pep-/#the-server-gateway-side

So I would say that

1. If you are writing a web application, and want it run under any
WSGI container, and for the user to be able to control that output in
a way with which they are familiar (i.e. which is documented and may
have specific configuration options), send the output to wsgi.errors.

2. If you are writing a web server, you should either capture or
ignore stderr. If it is captured, then it is reasonable to, e.g.,
write it to a file so that the user can find it. It should never be
mixed with stdout if stdout is the mechanism by which the application
communicates with the webserver, as with CGI.

Alan.

[0] http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt
"""
Section 6.2 Request Message-Bodies

   As there may be a data entity attached to the request, there
   MUST be a system defined method for the script to read these
   data. Unless defined otherwise, this will be via the 'standard
   input' file descriptor.
"""

[1] http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt
"""
Section 7. Data Output from the CGI Script

   There MUST be a system defined method for the script to send
   data back to the server or client; a script MUST always return
   some data. Unless defined otherwise, this will be via the
   'standard output' file descriptor
"""
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI for Python 3

2010-07-17 Thread Alan Kennedy
[PJ Eby]
> IOW, the bytes/string discussion on Python-dev has kind of led me to realize
> that we might just as well make the *entire* stack bytes (incoming and
> outgoing headers *and* streams), and rewrite that bit in PEP 333 about using
> str on "Python 3000" to say we go with bytes on Python 3+ for everything
> that's a str in today's WSGI.
>
> Or, to put it another way, if I knew then what I know *now*, I think I'd
> have written the PEP the other way around, such that the use of 'str' in
> WSGI would be a substitute for the future 'bytes' type, rather than viewing
> some byte strings as a forward-compatible substitute for Py3K unicode
> strings.
>
> Of course, this would be a WSGI 2 change, but IMO we're better off making a
> clean break with backward compatibility here anyway, rather than having
> conditionals.  Also, going with bytes everywhere means we don't have to
> rename SCRIPT_NAME and PATH_INFO, which in turn avoids deeper rewrites being
> required in today's apps.

+1

> (Hm.  Although actually, I suppose we *could* just borrow the time machine
> and pretend that WSGI called for "byte-strings everywhere" all along...)

+1/0

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Armin]
> Of course a server configuration variable would be a solution for many
> of these problems, but I don't like the idea of changing application
> behavior based on server configuration.

So you don't like the way that Django, Werkzeug, WebOb, etc, do it
now, even though they appear to be mostly successful, and you're happy
to cite them as such?

>From the applications point of view, a framework-level configuration
variable is the same as a server-level configuration variable.

> At that point we will finally
> have successfully killed the idea of nested WSGI applications, because
> those could depend on different charsets.

Wouldn't well-written applications depend on unicode?

The server configured charset is simply an explicit statement of the
character set from which incoming requests are to be decoded, into
unicode, and no other character set.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Armin]
> No, they know the character sets.

Hmmm, define "know" ;-)

[Armin]
> You tell them what character set you
> want to use.  For example you can specify "utf-8", and they will
> decode/encode from/to utf-8.  But there is no way for the application to
> send information to the server before they are invoked to tell the
> server what encoding they want to use.

I see this as being the same as Graham's suggested approach of a
per-server configurable charset, which is then stored in the WSGI
dictionary, so that applications that have problems, i.e. that detect
mojibake in the unicode SCRIPT_NAME or PATH_INFO, can attempt to undo
the faulty decoding by the server.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Armin]
> Because that problem was solved a long ago in applications themselves.
> Webob, Werkzeug, Paste, Pylons, Django, you name it, all are operating
> on unicode.  And the way they do that is straightforward.

So what are we all discussing?

Those frameworks obviously have solved all of the problems of decoding
incoming request components, e.g.

1. SCRIPT_NAME
2. PATH_INFO
3. QUERY_STRING
4. Etc

from miscellaneous unknown character sets into unicode, with out any
mistakes, under all possible WSGI environments, e.g.

1. Mod_wsgi
2. Modjy (java servlets)
3. IIS
4. CGI
5. FCGI
6. Etc

So why not just adopt one of those mechanisms, e.g. Django, and make
it the de-facto standard? Since they all deliver unicode, python 3 is
no longer a problem, since it permits only unicode strings.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Alan]
>> Is there a real need out there?

[Armin]
> In python 3, yes.  Because the stdlib no longer works with bytes and the
> bytes object has few string semantics left.

Why can't we just do the same as the java servlet spec? I.E.

1. Ignore the encoding issues being discussed
2. Give the programmer (possibly mojibake) unicode strings in the WSGI
environ anyway
3. And let them solve their problems themselves, using server
configuration or bespoke middleware

[Alan]
>> Java programmers just tolerate this, although they may curse the
>> developers of the servlet spec for not having solved their specific
>> problem for them.

[Armin]
> Many Java apps are also still using latin1 only or have all kinds of
> problems with charsets.

My point exactly.

Many web developers simply never have to deal with these issues,
perhaps a majority.

The ones that do have to sort it out for themselves.

To do so, the publishers of the various containers give them
(non-standard) options to control the decoding of the incoming request
and all of its component parts: you cited the Tomcat approach above.
Other containers do it differently. Which means that i18n knowledge is
not portable between containers.

It would be nice if we could avoid such a situation with i18n and WSGI.

But I suppose I'm a little dubious that this group can out-do the
enormous java community, and the enormous financial resources that
Sun, IBM, Oracle, etc, etc, plough into it. And still failed to solve
this complex problem satisfactorily.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[P.J. Eby]
>> Actually, latin-1 bytes encoding is the *simplest* thing that could
>> possibly work, since it works already in e.g. Jython, and is actually
>> in the spec already...  and any framework that wants unicode URIs
>> already has to decode them, so the code is already written.

[Armin]
> Except that nobody implements that

So, if nobody implements that, then why are we trying to standardise it?

Is there a real need out there?

Or are all these discussions solely driven by the need/desire to have
only unicode strings in the WSGI dictionary under python 3?

Which is a worthy goal, IMHO. Java has been there since the very
start, since java strings have always been unicode. Take a look at the
java docs for HttpServlet: no methods return bytes/bytearrays.

http://java.sun.com/products/servlet/2.5/docs/servlet-2_5-mr2/javax/servlet/http/HttpServletRequest.html

But the java servlet spec still ignores *all* of the encoding concerns
being discussed here. Which means that mistakes/mojibake must happen
all the time. And it's up to the author of the individual java web
application to solve those problems, using a mechanism appropriate for
their needs and local environment.

Java programmers just tolerate this, although they may curse the
developers of the servlet spec for not having solved their specific
problem for them.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Ian]
> When things get messed up I recommend people use a middleware
> (paste.deploy.config.PrefixMiddleware, though I don't really care what they
> use) to fix up the request to be correct.  Pulling it from REQUEST_URI would
> be fine.

That would be unworkable under java servlet containers, since they
each take a different approach to addressing encoding issues, or fail
to deal with them entirely.

So there would probably have to be a special case for every single one of these

http://en.wikipedia.org/wiki/List_of_Servlet_containers

Each of which has a number of different ways of being configured in
relation to these issues.

I don't know if it would even be possible to write such a middleware.

And retain all of one's hair.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Request for Comments on upcoming WSGI Changes

2009-09-22 Thread Alan Kennedy
[Ian]
>> OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO
>> introduce two equivalent variables that hold the NOT url-decoded values.

[Graham]
> That may be fine for pure Python web servers where you control the
> split of REQUEST_URI into SCRIPT_NAME and PATH_INFO in the first place
> but don't have that luxury in Apache or via FASTCGI/SCGI/CGI etc as
> that is done by the web server. Also, as pointed out in my blog,
> because of rewrites in web server, it may be difficult to try and map
> SCRIPT_NAME and PATH_INFO back into REQUEST_URI provided to try and
> reclaim original characters. There is also the problem that often
> FASTCGI totally stuffs up SCRIPT_NAME/PATH_INFO split anyway and
> manual overrides needed to tweak them.

This applies doubly under Java servlets, where different containers
take different approaches to solve these rather hard problems. It is
worth noting that they have to do so because the java servlet spec,
even under the most recent 2.5,  punts on *all* of the issues being
discussed here.

See here for how Tomcat does it. Or half does it, messily.

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

I know this is not helpful ;-)

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI 1 Changes [ianb's and my changes]

2009-09-18 Thread Alan Kennedy
[Rene]
>> I think you mean pre-2.2 support, not python 2.2?  iterators came
>> about in python 2.2.

[Armin]
> That might be.  That was before my time.  I'm pretty sure the first
> Python version I used was 2.3, but don't quote me on that.

As WSGI was being developed, cpython was at version 2.3.

The only reason that support for "older versions" was in the spec was
because jython was at version 2.1 at the time.

The WSGI spec was made much simpler by the use of the iterator
protocol (PEP 234), which was in introduced into the language in 2.2.
So where the spec says

"Supporting Older (<2.2) Versions of Python"

It should probably have read

"Supporting Older (pre-pep-234-iterator-protocol) Versions of Python"

I don't know of any modern python implementation that doesn't support
the iterator protocol.

It's probably time to drop that section from the PEP.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Announcing bobo

2009-06-16 Thread Alan Kennedy
[Etienne]
> If you want to start a thread for Bobo, please switch mailing-list or
> create a new thread, as all I wanted was to tell Jim my disappointement
> regarding Bobo, and I still think its not very revolutionary.

I completely disagree; this is definitely the appopriate list for
discussing web frameworks and new approaches. There is no perfect
framework in python, or any other language. It is only with the
introduction, discussion, acceptance and assimilation of new ideas
that we all move forward together.

Jim has the longest history of all in Python web frameworks; he
created the very concept. He founded and built the entire Zope
community; I will always listen to what he has to say.

I wish you the best of luck with your own web framework, notmm

http://gthc.org/projects/notmm/0.2.12/

Which seems to have some potential, but currently lacks community support.

http://gthc.org/community/

I'm looking forward to Europython, where I know I'll be meeting some
great python folks, and hopefully some of us will get to continue our
WSGI revision discussions.

All the best,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] RESTful Python email list?

2009-04-11 Thread Alan Kennedy
[Pete]
> Any interest in a dedicated email list for REST + python, a la the
> restful-json group [0]?  The group would discuss strategies for REST
> architecture built with and within Python.  WSGI 1.0 vs. 2.0 vs. 2e6 is out
> of scope. ;-)

Just a thought: is there any reason why RESTful python discussions
cannot take place on the restful-json group referred to?

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words

2009-04-08 Thread Alan Kennedy
[Brian]
> Here is the change that removes the use of RFC 2047 from HTTP in HTTPbis.

Grand so; all we need to do is to wait for everyone to stop using
HTTP/1.1, start using HTTP/bis, and our problems are at an end!

;-)

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] FW: Closing #63: RFC2047 encoded words

2009-04-08 Thread Alan Kennedy
[James]
> If you want to start a discussion about having a standard parsed-header
> object in WSGI, that's another thing, but saying that WSGI servers should
> *partially* decode the headers seems rather silly to me.

Hi James,

It's a shame that your proposal to add the twisted header parsing
library to the standard library didn't catch on years ago.

http://mail.python.org/pipermail/web-sig/2006-February/002119.html

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-02 Thread Alan Kennedy
[Sylvain]
> Would there be any interest in asking the HTTP-BIS working group [1] what
> they think about it?
>
> Currently I couldn't find anything in their drafts suggesting they had
> decided to clarify this issue from a protocol's perspective but they might
> consider it to be relevant to their goals.
>
> - Sylvain
>
> [1] http://www.ietf.org/html.charters/httpbis-charter.html

As mentioned in an earlier post, I think their current spec avoids the
issue, by still relying on "octet-by-octet" comparison.

But I did come across this discussion on their list, which goes into
all of the issues in fine detail.

http://www.nabble.com/PROPOSAL%3A-i74%3A-Encoding-for-non-ASCII-headers-tt16274487.html#a16291951

Quote of the thread

[Roy Fielding]
> We are simply passing through the one and only defined i18n solution
> for HTTP/1.1 because it was the only solution available in 1994.
> If email clients can (and do) implement it, then so can WWW clients.
>
> People who want to fix that should start queueing for HTTP/1.2.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-02 Thread Alan Kennedy
[Sylvain]
> Would there be any interest in asking the HTTP-BIS working group [1] what
> they think about it?
>
> Currently I couldn't find anything in their drafts suggesting they had
> decided to clarify this issue from a protocol's perspective but they might
> consider it to be relevant to their goals.
>
> - Sylvain
>
> [1] http://www.ietf.org/html.charters/httpbis-charter.html

I checked the current version of their replacement for RFC 2616. It says

"""
2.1.3.  URI Comparison

   When comparing two URIs to decide if they match or not, a client
   SHOULD use a case-sensitive octet-by-octet comparison of the entire
   URIs
"""

Which doesn't work if the two URIs to be compared are in different encodings.

I did find this page on the W3C site which at least explains the
issues, and does a survey of existing modern browsers for how they
encode URIs and IRIs.

http://www.w3.org/International/articles/idn-and-iri/

"""
Paths

The conversion process for parts of the IRI relating to the path is
already supported natively in the latest versions of IE7, Firefox,
Opera, Safari and Google Chrome.

It works in Internet Explorer 6 if the option in Tools>Internet
Options>Advanced>Always send URLs as UTF-8 is turned on. This means
that links in HTML, or addresses typed into the browser's address bar
will be correctly converted in those user agents. It doesn't work out
of the box for Firefox 2 (although you may obtain results if the IRI
and the resource name are in the same encoding), but technically-aware
users can turn on an option to support this (set
network.standard-url.encode-utf8 to true in about:config).

Whether or not the resource is found on the server, however, is a
different question. If the file system is in UTF-8, there should be no
problem. If not, and no mechanism is available to convert addresses
from UTF-8 to the appropriate encoding, the request will fail.

Files are normally exposed as UTF-8 by servers such as IIS and Apache
2 on Windows and Mac OS X. Unix and Linux users can store file names
in UTF-8, or use the mod_fileiri module mentioned earlier. Version 1
of the Apache server doesn't yet expose filenames as UTF-8.

You can run a basic check whether it works for your client and
resource using this simple test.

Note that, while the basics may work, there are other somewhat more
complicated aspects of IRI support, such as handling of bidirectional
text in Arabic or Hebrew, which may need some additional time for full
implementation.
"""

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI Open Space @ PyCon.

2009-04-01 Thread Alan Kennedy
[Noah]
> +1 on the iterator, although I might just like the idea and might be missing
> something important.  It seems like there are a lot of powerful things being
> developed with generators in mind, and there are some nifty things you can
> do with them like the contextlib example:
>  http://docs.python.org/library/contextlib.html#contextlib.closing

Indeed, like coroutines.

http://www.python.org/dev/peps/pep-0342/

[Robert]
>> The counter-argument was that
>> servers could use non-blocking sockets to allow apps which read() to
>> yield in the case of no immediate data rather than block indefinitely.

Ah, but the problem with that is that one can't magically suspend
methods like that and return control to the scheduler, without using
coroutines or stackless.

Who does the read() method return control to when there's no data
available (i.e. no bytes on the socket). If wsgi.input is a simple
file-like object, then it's methods must be coded to recognise, rather
than blocking, when the data is not yet available to fulfill the
applications expectation. How does it know how to return control to
the scheduler, instead of the application?

If the application expects to receive all of the data that it asked
for with a, say read(1024) call, it has to be prepared to accept that
it may get less than 1024 bytes, in an asynchronous situation. What
does it return to the application in the case when < 1024 bytes is
available?

>> If a file-like object were retained, it would help to publish a
>> chainable file example to help middleware re-stream files they read any
>> part of.

I don't think that re-streaming of input should be a part of the spec;
it's an application layer thing. We don't expect to re-stream the
output of an application: why re-stream the input?

If some application needs to examine the entire byte sequence for
whatever reasons, that's a special case that can be catered for with
itertools, and dedicated middleware.

>> Continuing deferred issues

>>  * Lifecycle methods (start/stop/etc event API driven by the container)

I'd really like to get this one nailed: java people and .net people
expect this stuff.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Alan Kennedy
Hi Bill,

[Bill]
> I think the controlling reference here is RFC 3875.

I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.

RFC 2616, the HTTP 1.1 spec, punts on the question of character
encoding for the request URI.

RFC 2396, the URI spec, says

"""
   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this
   specification.
"""

RFC 3987 is that spec, for Internationalized Resource Identifiers. It says

"""
An IRI is a sequence of characters from the Universal Character Set
(Unicode/ISO 10646).
"""

and

"""
1.2.  Applicability

   IRIs are designed to be compatible with recommendations for new URI
   schemes [RFC2718].  The compatibility is provided by specifying a
   well-defined and deterministic mapping from the IRI character
   sequence to the functionally equivalent URI character sequence.
   Practical use of IRIs (or IRI references) in place of URIs (or URI
   references) depends on the following conditions being met:
"""

followed by

"""
   c.  The URI corresponding to the IRI in question has to encode
   original characters into octets using UTF-8.  For new URI
   schemes, this is recommended in [RFC2718].  It can apply to a
   whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384],
   or the URN syntax [RFC2141]).  It can apply to a specific part of
   a URI, such as the fragment identifier (e.g., [XPointer]).  It
   can apply to a specific URI or part(s) thereof.  For details,
   please see section 6.4.
"""

I think the question is "are people using IRIs in the wild"? If so,
then we must decide how do we best deal with the problems of
recognising iso-8859-1+rfc2037 versus utf-8, or whatever
server-configured encoding the user has chosen.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Python 3.0 and WSGI 1.0.

2009-04-01 Thread Alan Kennedy
Hi Graham,

I think yours is a good solution to the problem.

[Graham]
> In other words, leave all the existing CGI variables to come through
> as latin-1 decode

As latin-1 or rfc-2047 decoded, to unicode.

> and do anything new in 'wsgi' variable namespace,

So the server provides

"wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
"wsgi.server_decoded_PATH_INFO" == u"whatever"
"wsgi.server_decode_charset" == u"utf-8"

Just my €0,02.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] thoughts on an iterator

2009-03-30 Thread Alan Kennedy
Hi all,

It was great to meet (nearly) everybody at PyCon; I look forward to
the next time.

I particularly want to thank Robert for being so meticulous about
recording and reporting the discussions; a necessary part of moving
forward, IMO.

[Robert]
> H. Graham brought up chunked requests which I don't think have much
> bearing on this issue--the server/app can't rely on the client-specified
> chunk sizes either way (or you enable a Denial of Service attack). I
> don't see much difference between the file approach and the iterator
> approach, other than moving the read chunk size from the app (or more
> likely, the cgi module) to the server. That may be what kills this
> proposal: cgi.FieldStorage expects a file pointer and I doubt we want to
> either rewrite the entire cgi module to support iterators, or re-package
> the iterator up as a file.

I recommend that any discussion of file-like vs. iterator for input
should be informed by this discussion between myself and PJE back when
the spec was being written.

http://mail.python.org/pipermail/web-sig/2004-September/000885.html

Most relevant quote

[PJE]
> Aha!  There's the problem.  The 'read()' protocol is what's wrong.  If
> 'wsgi.input' were an *iterator* instead of a file-like object, it would be
> fairly straightforward for async servers to implement "would block" reads
> as yielding empty strings.  And, servers could actually support streaming
> input via chunked encoding, because they could just yield blocks once
> they've arrived.
>
> The downside to making 'wsgi.input' an iterator is that you lose control
> over how much data to read at a time: the upstream server or middleware
> determines how much data you get.  But, it's quite possible to make a
> buffering, file-like wrapper over such an iterator, if that's what you
> really need, and your code is synchronous.  (This will slightly increase
> the coding burden for interfacing applications and frameworks that expect
> to have a readable stream for CGI input.)  For asynchronous code, you're
> just going to invoke some sort of callback with each block, and it's the
> callback's job to deal with it.
>
> What does everybody think?  If combined with a "pause iterating me until
> there's input data available" extension API, this would let the input
> stream be non-blocking, and solve the chunked-encoding input issue all in
> one change to the protocol.  Or am I missing something here?

http://mail.python.org/pipermail/web-sig/2004-September/000890.html

I'd also be interested in the Twisted folk's take on that discussion.

All the best,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] WSGI Open Space @ PyCon.

2009-03-27 Thread Alan Kennedy
Dear all,

For those of you at PyCon, there is a WSGI Open Space @ 5pm today (Friday).

The sub-title of the open space is "Does WSGI need revision"?

An example: Philip Jenvey (http://dunderboss.blogspot.com/) raised the
need for something akin to what Java folks call "Lifecycle methods",
so that WSGI apps can do initialization and finalization.

http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets4.html

I'm sure there are plenty of other topics that could be discussed as well.

See you @5pm.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Use both Python and Javascript in html webpages

2009-03-05 Thread Alan Kennedy
[David]
> Can we use both Python and Javascript in html webpages?   Any demo on this?

If you're willing to write rpython, PyPy can compile it to javascript
which run can in a browser.

http://codespeak.net/pypy/dist/pypy/doc/js/using.html

HTH,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification

2008-11-18 Thread Alan Kennedy
[Graham]
> I would be for (1) errata or amendment as reality is that there is
> probably no WSGI implementation that disallows an argument to
> readline() given that certain Python code such as cgi.FieldStorage
> wouldn't work otherwise.
>
> For such a clarification on existing practice, I see no point in
> having to change wsgi.version in environ as it would just cause
> confusion.

+1

[Graham]
> I would also like to see other changes to WSGI specification but now
> is not the time, let us at least though get this obvious issue with
> API dealt with. After that we can then perhaps have a discussion of
> future of WSGI specification and whether there really is any interest
> in future versions with more significant changes.

+1

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Newline values in WSGI response header values.

2008-06-12 Thread Alan Kennedy
[Graham]
> Thus, is an embedded newline in value invalid? Would it be reasonable
> for a WSGI adapter to flag it as an error?

>From a security POV, it may be advisable for WSGI servers to *not*
allow newlines in HTTP response headers; newlines in response headers
may be the result of an application's failure to sanitise its inputs.

http://en.wikipedia.org/wiki/HTTP_response_splitting

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-10 Thread Alan Kennedy
[Bob]
>  simplejson would give you an error and tell you exactly where the
>  problem was,

Another good point.

Other JSON modules should follow simplejson's lead, and provide access
to the location in the document where the lexical or parse error
occurred, so that the offending document can be opened in a text
editor to determine the source of the problem, and perhaps fix it.

This should also apply to "junk" after the document object, i.e. JSON
expressions present in the document after the main document has been
successfully parsed. A strict interpretation of the spec is that such
"junk" is not permitted, and makes the JSON document broken, even
though the main object representation is valid.

Simplejson has an option for the user to control this, and jyson does
too; I don't know about the others.

[Bob]
> but there isn't currently a non-strict mode and honestly
>  nobody has asked for it.

If we only need "strict mode", then why do all of our parsers have options?

Isn't "permissive mode" just a way of setting all of the parse options
to liberal, in one go?

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-09 Thread Alan Kennedy
[Alan]
>> [hand written JSON containing a] hard-to-spot dangling comma, from all the
>>  copying and pasting. That broke his javascript library; he solved the
>>  problem by passing it through a PHP JSON codec on his local Apache. It
>>  worked, i.e. his problem disappeared, but he didn't know why (the PHP
>>  lib had eliminated the dangling comma). Which all goes to confirm,
>>  IMHO, that you should be liberal in what you consume and strict in
>>  what you produce.

[John]
>  Sounds like a case *for* strict parsing, in my opinion. PHP's loose
>  parsing made it difficult to figure out why the JSON was invalid. If
>  trailing comma handling is to try to work around copy-paste errors, -1
>  from me.

No, the PHP lib did exactly what it should, IMHO. The PHP lib was
liberal in what it consumed (a dangling comma), and strict in what it
produced (no dangling comma).

It accepted my broken document with a dangling-comma, and emitted a
strictly conformant document with the offending comma removed, which
enabled my co-worker to proceed with his job.

+1 from me.

Other opinions?

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-04-09 Thread Alan Kennedy
[John]
> I'm interested in whether you generally use JSON to communicate with a
> JavaScript client, or another JSON library. Both the demjson and simplejson
> libraries are written with the assumption that they are to be used to
> interact with JavaScript.

Answer #1: My motive is simply to implement the JSON spec, in a
[j|p]ythonic way. If the ideal of JSON is to be realised, then the
producer of the document is not relevant: it is only the document
itself that matters.

Answer #2: I'm working (i.e. day job) with JSON at the moment: a
javascript client talking to a java server. The JS guy had a problem
last week with a sample JSON document I gave him to prototype on. I
wrote the sample by hand (it later became my freemarker template), and
so inadvertently left in a hard-to-spot dangling comma, from all the
copying and pasting. That broke his javascript library; he solved the
problem by passing it through a PHP JSON codec on his local Apache. It
worked, i.e. his problem disappeared, but he didn't know why (the PHP
lib had eliminated the dangling comma). Which all goes to confirm,
IMHO, that you should be liberal in what you consume and strict in
what you produce.

[John]
> You mentioned in an earlier e-mail that jyson supports reading arrays with
> trailing commas -- is this intentional, or accidental? Do you read them with
> Python or JavaScript semantics?

Went out of my way to accept them, with python semantics.

Javascript semantics differ. Last time I tested, FireFox and IE
interpreted "[1,2,3,]" differently as [1,2,3] and [1,2,3,null].
Although that may have changed during the meanwhilst.

[Alan]
> > 2. To have a native-code implementation, customised for jython.

[John]
> Did you encounter any particular issues related to implementing a JSON
> library in Jython that would affect how a standard library implementation's
> API should be designed?

Jython is changing rapidly. It is evolving from a 2.2 stage ("from
__future__ import generators") to a 2.5 stage in one leap. Jython 2.5
is built with java 1.5 (1.5 is where java grew annotations and
generics). Between 2.2. and 2.5, python has grown Decimal's, generator
comprehensions, decorators, context managers, bi-directional
generators, etc. I prefer for a pure java implementation of a JSON
codec to remain flexible in terms of the way that it maps
"fundamental" JSON types into the jython type hierarchy and
interpreter machinery[1].

I'm beginning to think that any putative JSON API should permit the
user to specify which class will be used to instantiate JSON objects.
If the users can specify their own classes, that might go a long way
way resolve issues such as "I need my javascript client to communicate
Numbers representing radians to my python server which uses Decimal
because it works better with my geo-positioning library". Standard
libraries should provide their own set of default instantiation
classes, which the user could override.

Regards,

Alan.

[1] There is an argument that a pure java JSON parser for jython is
not worth the effort, in performance terms at least. JVM optimisation
is very sophisticated these days, and it is conceivable that pure
python (byte)code could run as fast or faster on a JVM than equivalent
java code. Think PyPy. So maybe a single well-designed pure-python
JSON module in the cpython standard library is the way to go.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc

2008-04-08 Thread Alan Kennedy
[Alan]
>> [2] Perhaps some pythonista from Web-SIG is most appropriate to advise
>> how JSON-RPC should move forward? After all, we're more accustomed to
>> server-side stuff than those javascript folks ;-)

[Ian]
>  Let it die?  It is more complicated than necessary, when instead you could
> just make each function a URL of its own, and POST the arguments and get
> back the response, with 500 Server Error for errors.  It's hard to spec that
> up because it's too simple.
>
>  OHM (http://pythonpaste.org/ohm/) follows this model of exposing a service.

Mmmm, very RESTful.

Access to the requested HTTP method is a fundamental for RESTful services.

I find it interesting that Java's HttpServletRequest has a
.getMethod(), but no .setMethod(). Which means that one has to
implement method overrides[1] by carrying the override value through
means other than the request object itself.

Whereas in WSGI, I can simply do: environ['REQUEST_METHOD'] =
environ['HTTP-X-HTTP-METHOD-OVERRIDE']

I've heard WSGI described as "python's servlet API". It's not that; it's better.

Regards,

Alan.

[1] http://code.google.com/apis/gdata/basics.html#Updating-an-entry
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [proposal] merging jsonrpc into xmlrpc

2008-04-08 Thread Alan Kennedy
[Ronny]
>>  since json-rpc and xml-rpc basically do the same
>>  and the only difference is the content-type (json is more concise),
>>  i propose to create a single xml/json-rpc module.

[Graham]
>  The problem with the JSON-RPC 1.0 specification was that it wasn't
>  always as clear as could have been.
>
>  Unfortunately the JSON-RPC 1.1 draft specification didn't necessarily
>  make things better.

>  The JSON-RPC 1.1
>  specification was also never really completed and left out details
>  such as standard error codes etc that there were proposing be
>  specified.

All valid concerns.

I think that the JSON-RPC initiative lost its way a little. They tried
to model things such as encoding and decoding an object graph, using
object references, etc, which IMHO is a step too far for the usages
JSON-RPC would get, and is more CORBA than XML-RPC.

The maintainer of the JSON-RPC.org site was looking for someone to
take it over for a while; I think someone might have taken it over
last year.

[Graham]
>  Are you
>  prepared to go and test it with a sufficient range of clients to make
>  sure Python implemented server side interops properly?

Interestingly, the reference implementation for JSON-RPC is a server
written in python[1].

http://json-rpc.org/wiki/python-json-rpc

Perhaps python's best interests in this case are better served by
letting that reference implementation drive the JSON-RPC standards
process[2]?

If that is the case, then it is counter-productive to add a competing
module to the python standard library.

Regards,

Alan.

[1] But it's a shame they didn't write it on WSGI: then their services
could have run on the Google compute cloud ;-)

[2] Perhaps some pythonista from Web-SIG is most appropriate to advise
how JSON-RPC should move forward? After all, we're more accustomed to
server-side stuff than those javascript folks ;-)
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-03-16 Thread Alan Kennedy
[Deron]
> (I just joined this list, so this reponse may not be threaded properly)

[Bob]
> I wasn't subscribed to the list at the time this came up, but I'm all
> for getting simplejson into the stdlib.

Well, it appears we have a quorum of JSON<->python codec writers,
since I've written a jython module that I'd like to interoperate with
cpython codecs. I think it's appropriate for any discussions of JSON
to take place on the web-sig.

I've been thinking about how to take this forward. I see two ways

Formal approach


Introduce a "Standards Track" Library PEP, which is designed for the
purpose of bringing a new module through a full peer-review process
and into the python standard library. (Which means we in jython and
ironpython land should also then provide it). This would have the
following outcomes

 - Result in a single JSON implementation going into the cpython
standard library, possibly in Python 3000
 - Expose the new module to full community review/bug-tracking/modification
 - Opportunity to thrash out all of the finer points of JSON<->python
transcoding, including but not limited to
  - NaN, Infinity, etc
  - What is the most appropriate number/integer/float/double/decimal
representation
  - Structural strictness, e.g. junk after document body, dangling commas, etc.
  - BMP support
  - Byte encoding detection
  - Python 3000 support
 - Standardise the interface, de facto

However, this option is somewhat complicated by the fact that we seem
to have TWO quality cpython implementations competing for a place in
the cpython standard library. Also, I think the PEP process might be a
little cumbersome for this topic, given that the PEP process involves
commit rights to the cpython source tree (since the proposal for a new
module should be accompanied by the source code of the proposed
implementation).

Informal approach
=

Develop and document a standard interface, and ensure that all of our
modules support it. This interface would define method, class and
exception names. Standard methods would probably "load" and "dump"
objects, possibly creating "JSONEncoder"s and "JSONDecoder"s to do the
job: "JSONException" and subclasses thereof would signify errors.
Perhaps a standard mechanism to retrieve the location of errors, e.g.
line and column, would be appropriate? Perhaps a standard set of
feature/option names could be agreed, e.g. "accept_NaN", etc.

User code written to this standard could move reasonably easily
between implementations, or indeed between platforms. This approach
has the benefits that

 - Authors are free to interpret edge cases as they see fit, and
provide options.
 - Competing implementations can continue to improve in the field
 - Changing implementations could be as simple as using a different egg
   (Although an exhaustive set of test cases covering the required
behaviour is recommended)

We could call it PAJ, Python Api for Json, or some such.

I feel the informal option is more appropriate. It could be
effectively managed on a wiki page. Or perhaps a ticketing system
(e.g. TRAC) would be good for tracking detailed discussions of JSON's
many edge cases, etc. I would be willing to start a wiki page with
details about a putative module interface.

Finally, at this stage I think speed is less of a concern; correctness
is more important for now. As Aahz is fond of quoting, "It is easier
to optimize correct code than to correct optimized code".

Thoughts?

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-03-11 Thread Alan Kennedy
[Massimo]
> It would also be nice to have a common interface to all modules that
> do serialization. For example pickle, cPickle, marshall has dumps, so
> json should also have dumps.

Indeed, this is my primary concern also.

The reason is that I have a pure-java JSON codec for jython, that I
will either publish separately or contribute to jython itself.

If we're going to have the facility in both cpython and jython (and
probably ironpython, etc), then it would be optimal to have a
compatible API so that we have full interoperability. And given that
we in jython land are always left implementing cpython APIs (which are
not necessarily always the optimal design for jython) it would be nice
if we could agree on APIs, etc, *before* stuff goes into the standard
library.

The API for my codec is slightly different from simplejson, although
it could be made the same with a little work, including exception
signatures, etc.

But there are some things about my own design that I like. For
example, simplejson allows override of the JSON output representing
certain objects, by the use of subclasses of JSONEncoder. My design
does it differently; it simply looks for a "__json__()" callable on
every object being serialised, and if found, calls it and uses its
return value to represent the object. I have no equivalent of
simplejson's decoding extensions.

Another difference is the set of options. Simplejson has options to
control parsing and generation, and so does mine. But the sets of
options are different, e.g. simplejson has no option to permit/reject
dangling commas (e.g. "[1,2,3,]")*, whereas mine has no support for
accepting NaN, infinity, etc, etc.

On the encoding side, I simply make the assumption that all character
transcoding has happened before the JSON text reaches the JSON parser.
(I think this is a reasonable assumption, given that byte streams are
always associated with file storage, network transmission, etc, and
only the programmer has access to the relevant encoding information).
But given that RFC 4627 specifies how to guess encoding of JSON byte
streams, I'll probably change that policy.

Lastly, another area of potential cooperation is testing: I have over
100 unit-tests, with fairly extensive coverage. I think that test
coverage is very important in the case of JSON; you can never have too
many tests.

So, what is the best way to go about agreeing on the best API?

1. Discussion on web-sig?
2. Discussion on stdlib-sig?
3. Collaborative authoring/discussion on a WIKI page?
4. 

Regards,

Alan.

* Which can mean different things to different software. Some
javascript interpreters interpret it as a 4 element list (inferring
the last object between the comma and the closing square bracket as a
null) , others as a 3 element list. Python obviously interprets it as
a 3-element list. So the general internet maxim "be liberal in what
you accept and strict in what produce" applies. My API gives control
of this strictness/relaxedness to the user.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Time a for JSON parser in the standard library?

2008-03-11 Thread Alan Kennedy
[Graham]
>  The problem areas were, different interpretations of what could be
>  supplied in an error response. Whether an integer, string or arbitrary
>  object could be supplied as the id attribute in a request. Finally,
>  some JavaScript clients would only work with a server side
>  implementation which provided introspection methods as they would
>  dynamically create a JavaScript proxy object based on a call of the
>  introspection methods.

These are JSON-RPC concerns, and nothing to do with JSON text de/serialization.

I do believe we're only discussing JSON<->python objects
transformation, in this thread at least.

>  Unfortunately the JSON 1.1 draft specification didn't necessarily make
>  things better.

There is no JSON 1.1 spec; but there is a JSON-RPC 1.1 spec.

http://json-rpc.org/wiki/specification

>  Thus my question is, what version of the JSON specification are you
>  intending to support.

The one specified in RFC 4627

http://www.ietf.org/rfc/rfc4627.txt

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Time a for JSON parser in the standard library?

2008-03-10 Thread Alan Kennedy
Dear all,

Given that

1. Python comes with "batteries included"

2. There is a standard library re-org happening because of Py3K

3. JSON is now a very commonly used format on the web

Is it time there was a JSON codec included in the python standard library?

(If XML is already supported, I see no reason why JSON shouldn't be)

Or is it best to make users who want to use JSON go and research all
of the different options available to them?

Choosing a Python JSON Translator
http://blog.hill-street.net/?p=7

Just a thought.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI, Python 3 and Unicode

2007-12-07 Thread Alan Kennedy
[Alan]
>> The restriction to iso-8859-1 is really a distraction; iso-8859-1 is
>> used simply as an identity encoding that also enforces that all
>> "bytes" in the string have a value from 0x00 to 0xff, so that they are
>> suitable for byte-oriented IO. So, in output terms at least, WSGI *is*
>> a byte-oriented protocol. The problem is the python-the-language
>> didn't have support for bytes at the time WSGI was designed.

[Thomas]
> If you're talking about the "output stream", then yes, it's all about
> bytes (or should be).

Indeed, I was only talking about output, specifically the response body.

> But at the status and headers level, HTTP/1.1 is
> fundamentally ISO-8859-1-encoded.

Agreed.

That is why the WSGI spec also states

"""
Note also that strings passed to start_response() as a status or as
response headers must follow RFC 2616 with respect to encoding. That
is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME
encoding.
"""

So in order to use non-ISO-8859-1 characters in response status
strings or headers, you must use RFC 2047.

As confirmed by the links you posted, this is a HTTP restriction, not
a WSGI restriction.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI, Python 3 and Unicode

2007-12-07 Thread Alan Kennedy
[Phillip]
>> WSGI already copes, actually.  Note that Jython and IronPython have
>> this issue today, and see:
>>
>> http://www.python.org/dev/peps/pep-0333/#unicode-issues

[James]
> It would seem very odd, however, for WSGI/python3 to use strings-
> restricted-to-0xFF for network I/O while everywhere else in python3 is
> going to use bytes for the same purpose.

I think it's worth pointing out the reason for the current restriction
to iso-8859-1 is *because* python did not have a bytes type at the
time the WSGI spec was drawn up. IIRC, the bytes type had not yet even
been proposed for Py3K. Cpython effectively held all byte sequences as
strings, a paradigm which is (still) followed by jython (not sure
about ironpython).

The restriction to iso-8859-1 is really a distraction; iso-8859-1 is
used simply as an identity encoding that also enforces that all
"bytes" in the string have a value from 0x00 to 0xff, so that they are
suitable for byte-oriented IO. So, in output terms at least, WSGI *is*
a byte-oriented protocol. The problem is the python-the-language
didn't have support for bytes at the time WSGI was designed.

[James]
> You'd have to modify your app
> to call write(unicodetext.encode('utf-8').decode('latin-1')) or so

Did you mean: write(unicodetext.encode('utf-8').encode('latin-1'))?

Either way, the second encode is not required;
write(unicodetext.encode('utf-8')) is sufficient, since it will
generate a byte-sequence(string) which will (actually "should": see
(*) note below) pass the following test.

try:
   wsgi_response_data.encode('iso-8859-1')
except UnicodeError:
   # Illegal WSGI response data!

On a side note, it's worth noting that Philip Jenvey's excellent
rework of the jython IO subsystem to use java.nio is fundamentally
byte oriented.

http://www.nabble.com/fileno-support-is-not-in-jython.-Reason--t4750734.html
http://fisheye3.cenqua.com/browse/jython/trunk/jython/src/org/python/core/io

Because it is based on the new IO design for Python 3K, as described in PEP 3116

http://www.python.org/dev/peps/pep-3116/

Regards,

Alan.

[*] Although I notice that cpython 2.5, for a reason I don't fully
understand, fails this particular encoding sequence. (Maybe it's to do
with the possibility that the result of an encode operation is no
longer an encodable string?)

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> response = u"interferon-gamma (IFN-\u03b3) responses in cattle"
>>> response.encode('utf-8').encode('latin-1')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position
22: ordinal not in range(128)
>>>

Meaning that to enforce the WSGI iso-8859-1 convention on cpython 2.5,
you would have to carry out this rigmarole

>>> response.encode('utf-8').decode('latin-1').encode('latin-1')
'interferon-gamma (IFN-\xce\xb3) responses in cattle'
>>>

Perhaps this behaviour is an artifact of the cpython implementation?

Whereas jython passes it just fine (and correctly, IMHO)

Jython 2.2.1 on java1.4.2_15
Type "copyright", "credits" or "license" for more information.
>>> response = u"interferon-gamma (IFN-\u03b3) responses in cattle"
>>> response.encode('utf-8')
'interferon-gamma (IFN-\xCE\xB3) responses in cattle'
>>> response.encode('utf-8').encode('latin-1')
'interferon-gamma (IFN-\xCE\xB3) responses in cattle'
>>>
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Modjy and jython 2.2.

2007-09-05 Thread Alan Kennedy
Dear all,

Now that jython 2.2 has been released (hooray!)

http://www.jython.org/Project/download.html

it's time for a quick update on the status of modjy, the jython 
WSGI/J2EE gateway.

http://www.xhaus.com/modjy/

Previous versions of modjy were based on jython 2.1, which didn't have 
support for the iterator protocol. However, the new jython 2.2 has full 
iterator and generator support, and so is capable of full WSGI support 
(round of applause for the hard work of the jython-dev team).

In a testament to the stability of jython and the clean design of WSGI, 
the modjy code has not changed; the original jython 2.1 version of modjy 
works seamlessly with jython 2.2, unmodified.

Still, I am making an interim release, for two purposes

1. To fix a longstanding bug in the implementation
2. To explicitly mention jython 2.2 in the documentation

I'm off on vacation soon, and wanted to make this small "publicity 
release" before I go.

When I return, I will be making the following modifications

1. Adding a full test suite, based on MockRunner, the mock Java Servlet 
framework.
2. Improving J2EE resource handling
3. Improving import handling
4. Various small improvements and documentation updates.

All the best,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Web Site Process Bus

2007-06-26 Thread Alan Kennedy
[Graham Dumpleton]
 > First comment is about WSGI applications somehow themselves using
 > SIGTERM etc as triggers for things they want to do. For Apache at
 > least, allowing any part of a hosted Python application to register
 > its own signal handlers is a big no no. This is because Apache itself
 > uses a whole range of signals to manage such tasks as shutting down
 > sub processes or signaling worker and/or listener threads within a
 > process that its time to wakeup or shutdown. If a WSGI application
 > starts registering signal handlers it can as a result stop Apache from
 > even being able to process requests. In mod_wsgi I have had to
 > specifically take steps to prevent applications breaking things in
 > this way by replacing signal.signal() on creation of an interpreter.
 > Instead I log a warning that the signal registration has been ignored
 > and otherwise do nothing. This was simply the safest thing to do.
 >
 > Thus I believe a clear statement should be made that UNIX signals are
 > off limits to WSGI applications or components.

 From a jython POV, I agree with this statement; signals don't even 
exist on java/jython (although some JVMs have non-standard extensions 
for signals).

Thus, any "standard" involving signals would not be implementable on 
jython, and I guess ironpython too.

[Graham Dumpleton]
 > Anyway, just wanted to make it absolutely clear that I don't believe a
 > hosted WSGI application and associated framework has any business
 > taking direct interest in low level UNIX signals.

Agreed.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Direct use of sys.stdout, sys.stderr and sys.stdin in WSGI application.

2007-03-22 Thread Alan Kennedy
[Alan Kennedy]
>>Strictly speaking, WSGI requires python 2.2,
>>because of iterators.

[Phillip J. Eby]
> Actually, it doesn't.  The pre-2.2 iterator protocol is to be used in such
> cases:
>
> http://www.python.org/dev/peps/pep-0333/#supporting-older-2-2-versions-of-python

Dang! I knew I couldn't say anything on web-sig without being contradicted ;-)

I am familiar with that section. I'm sure you remember writing this in
the credits section: "Alan Kennedy, whose courageous attempts to
implement WSGI-on-Jython (well before the spec was finalized) helped
to shape the "supporting older versions of Python" section".

But if the users want their "modern" python applications to be
portable everywhere on WSGI, e.g. returning (iterable) files as ouput,
or generators, then they should really stick with 2.2+.

But you are, of course, right about the pre-2.2 iterator protocol. I
wrote modjy for jython 2.1 according to the PEP guidelines, and have
had user reports that it works without modification on jython 2.2+.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Direct use of sys.stdout, sys.stderr and sys.stdin in WSGI application.

2007-03-22 Thread Alan Kennedy
Graham,

I thought I'd reply, so that we'd get replies from everyone else to
tell me I'm wrong.

All your points are good common-sense stuff. I think that all of your
policies on stdin, stdout, and stderr are good, and are appropriate
for a WSGI environment running inside an Apache server.

Some small points.

> . one could actually write to sys.stdout directly as
> well since that is where the WSGI adapter writes it to anyway.

I think it's a good idea to redirect stdout, and to document in your
server/gateway documentation that you are doing so. I also think this
is a server specific issue.

> Anyway, it may seem good practice for a WSGI adapter to still prevent
> use of sys.stdin unless configured explicitly to allow it and even
> then it might only allow it if the server is running in a mode whereby
> it would work.

This should be a server-specific feature, that is documented.

> Finally, sys.stderr also presents problems of its own. Although
> wsgi.errors is provided with the request environment, this can't be
> used at global scope within a module when importing and also shouldn't
> be used beyond the life time of the specific request. Thus, there
> isn't a way to log stuff outside of a request and ensure it gets to
> the server log. One could try and mandate use of 'logging' module, but
> this isn't available in old versions of Python.

I don't think you need to worry about versions of python that don't
have the logging module. Strictly speaking, WSGI requires python 2.2,
because of iterators. So I think it's extremely unlikely that people
will be running WSGI apps on pre-2.2 VMs.

> What you need is for sys.stderr to be underlayed with thread
> specific log objects each with its own buffering mechanism that
> ensures that only complete lines of text get sent to the actual log
> file.

This is a server/gateway implementation detail.

> Yes one could simply ignore the whole issue, but I feel that a good
> quality WSGI adapter/server should address these issues and either
> lock things down as appropriate to protect users from themselves or
> ensure that using them results in a sensible outcome.

Given how much talk there is of the WSGI "environment", I think it's
good to raise these issues.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Relationship between SCRIPT_NAME and PATH_INFO.

2007-01-28 Thread Alan Kennedy
[Graham Dumpleton]
> Should a WSGI adapter for a web server which allows a mount point to
> have a trailing slash specifically flag as a configuration error an
> attempt to use such a mount point given that it appears to be
> incompatible with WSGI?

OK, I'll have a go.

I think the question boils down to the following:

Assume an application mount point of "/application".

If a request is received for

/application

Then it will (and should) be redirected to the URL

/application/

Is that new URL to be interpreted as

SCRIPT_NAME: /application
PATH_INFO:   /

or interpreted as

SCRIPT_NAME: /application/
PATH_INFO:

I think that the WSGI interpretation is the first interpretation, and
the correct one, because it gives correct results when deriving
relative URLs for resources contained within the application.

Is that addressing the question?

[Graham Dumpleton]
> It therefore seems that the idea of the mount point for an
> application having a trailing slash may be incompatible
> with WSGI. Can this be considered to be the case or is there
> some other way one is meant to deal with this?

I don't know about "incompatible", although it obviously creates the
double-slash problem with computed URLs.

Perhaps the Apache "policy" on this issue is influenced by its origins
as a http server for serving hierarchies of directories and files from
a filesystem?

When it comes to CGI though, Apache does the right thing and passes

SCRIPT_NAME: /application
PATH_INFO:   /

to CGI scripts.

I don't know if this provides any insight into whether or not mounting
applications with a trailing slash is an error.

Does that help at all?

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI input filter that changes content length.

2007-01-15 Thread Alan Kennedy
[Graham]
> Hmmm, maybe I should have phrased my question a bit differently as to be
> honest I am not actually interested in doing on the fly decompression and
> only used it as an example. I really only want to know about how the
> content length is supposed to be dealt with. I didn't want to explain the
> actual context for the question as didn't want to let on yet to what I am up
> to, so used an example which I thought would illustrate the problem.

Point taken. But I think gzip encoding is a good example to illustrate
the issues.

[Graham]
> If I leave the
> content length header as is and any application does a
> read(content_length) and decompression or some other input filter
> actually results in more data than that being available, the application
> will not get it all as it has only asked to read the original length
> before decompression.

So obviously the Content-Length header cannot be left unmodified if
some transformation is in place that is altering the length of the
content.

There are two choices for how the wrapping should happen.

1. The ungzipping filter reads the entirety of the (possibly huge)
input, decompresses it, and makes it available in wsgi.input. The
Content-Length header is rewritten to reflect the length of the
decompressed content. The client has a valid Content-Length value, but
the server has had to buffer a potentially large input stream in order
to be able to provide that.

2. The ungzipping filter wraps the compressed stream, and decompresses
on demand and on-the-fly. In this case, it *must* delete the old
Content-Length header, which is now invalid. It cannot provide a new
value for Content-Length, since the final uncompressed length of the
input stream cannot be known.

[Graham]
> The PEP says that an application though should not attempt to read more
> data than has been specified by the content length. If it is common
> practice that applications take this literally and always get data from
> the input by using read(content_length) then there is a requirement that
> the content length header must exist. Thus, if the input filter does zap
> the content length header and remove it then an application which does
> that will not work.

Then I suppose that that application is not a fully-compliant WSGI application.

Scenario 2 outlined above is a perfectly valid scenario that can
happen, so an application that cannot deal with that scenario is not
robust.

> Thus the question probably is, what is accepted practice or what does
> the PEP dictate as to how applications should use read()?

AFAICT, the PEP is not prescriptive about the use of the
wsgi.input.read() method.

However, given that you have found it necessary to raise the question,
perhaps it should be added to the WSGI PEP that absence of a
Content-Length header does NOT imply absence of content.

[Graham]
> So, is it okay to remove the content length header when there is actually
> data and I know it wouldn't actually be correct,

I would say it's compulsory to remove the header: it contains an
incorrect value, and if the application uses that value, it will get
unexpected data or an exception, and rightly so.

[Graham]
> or does that result in a
> situation that is seen as violating the PEP or even if acceptable would break
> existing WSGI applications.

I would say that leaving an incorrect value in place should be a
violation of the PEP.

> Or in short, is it mandatory that content length header must exist if there is
> non zero length data in input? I know the PEP says that the content length
> may be empty or absent, but am concerned that applications would assume
> it has value of 0 if empty or absent.

No, the Content-Length header is optional, and any applications that
operate otherwise are non-compliant.

[Alan]
>> 6. Exactly the same principles should apply to decoding incoming
>> Transfer-Encoding: chunked.

[Graham]
> My understanding is that content encoding is different to transfer encoding,
> ie., is not hop by hop in this sense and that the same statements don't apply.

Hop-by-hop header means that the attribute described in the header is
not an inherent attribute of the content being transferred, but is
solely used in one stage of a multi-hop communication.

If my browser is using a proxy, which relays requests on to a server,
the proxy may decide to use Transfer-Encoding to communicate with the
server. Thus the Transfer-Encoding only applies to the proxy->server
"hop". If the server receives such a Transfer-Encoding, it *must*
decode the content according to that Transfer-Encoding before making
it available to the application.

[Graham]
> Wait till you see what I am about to come out with if I can sort this issue 
> out. :-)

Now I'm intrigued :-)

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI input filter that changes content length.

2007-01-15 Thread Alan Kennedy
[Graham Dumpleton]
> How does one implement in WSGI an input filter that manipulates the request
> body in such a way that the effective content length would be changed?

> The problem I am trying to address here is how one might implement using WSGI 
> a
> decompression filter for the body of a request. Ie., where "Content-Encoding:
> gzip" has been specified.

> So, how is one meant to deal with this in WSGI?

The usual approach to modifying something something in the WSGI
environment, in this case the wsgi.input file-like object, is to wrap
it or replace it with an object that behaves as desired.

In this case, the approach I would take would be to wrap the
wsgi.input object with a gzip.GzipFile object, which should only read
the input stream data on demand. The code would look like this

import gzip
wsgi_env['wsgi.input'] = gzip.GzipFile(wsgi_env['wsgi.input'])

Notes.

1. The application should be completely unaware that it is dealing
with a compressed stream: it simply reads from wsgi.input, unaware
that reading from what it thinks the input stream is actually causing
cascading reads down a series of file-like objects.

2. The GzipFile object will decompress on the fly, meaning that it
will only read from the wrapped input stream when it needs input.
Which means that if the application does not read data from
wsgi.input, then no data will be read from the client connection.

3. The GzipFile should not be responsible for enforcement of the
incoming Content-Length boundary. Instead, this should be enforced by
the original server-provided file-like input stream that it wraps. So
if the application attempts to read past Content-Length bytes, the
server-provided input stream "is allowed to simulate an end-of-file
condition". Which would cause the GzipFile to return an EOF to the
application, or possibly an exception.

4. Because of the on-the-fly nature of the GzipFile decompression, it
would not be possible to provide a meaningful Content-Length value to
the application. To do so would require buffering and decompressing
the entire input data stream. But the application should still be able
to operate without knowing Content-Length.

5. The wrapping can NOT be done in middleware. PEP 333, Section "Other
HTTP Features" has this to say: "WSGI applications must not generate
any "hop-by-hop" headers [4], attempt to use HTTP features that would
require them to generate such headers, or rely on the content of any
incoming "hop-by-hop" headers in the environ dictionary. WSGI servers
must handle any supported inbound "hop-by-hop" headers on their own,
such as by decoding any inbound Transfer-Encoding, including chunked
encoding if applicable." So the wrapping and replacement of wsgi.input
should happen in the server or gateway, NOT in middleware.

6. Exactly the same principles should apply to decoding incoming
Transfer-Encoding: chunked.

HTH,

Alan.

P.S. Thanks for all your great work on mod_python Graham!
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [Fwd: Summer of Code preparation]

2006-04-19 Thread Alan Kennedy
[Peter Hunt]
>> I think an interesting project would be complete integration of the
>> client and server via AJAX. That is, whenever a DHTML event handler
>> needs to be called on the client-side, the document state is serialized
>> and it is sent along with the DHTML event information to the server,
>> informing it that an event occured.

[Matt Goodall]
> Invoking something server-side every time there's some (interesting)
> event in the browser will almost certainly perform badly due to network
> latency and possibly put unnecessary load on the server.

I was going to refrain from this conversation, but now find the
following point relevant:

How long before we end up reinventing X-windows-style transmission of
UI events across the network, i.e. by sending all browser events over
HTTP to the server?

It's worth noting that, in the early days of X-windows, people said it
was far too heavyweight, and would saturate networks and quickly
become unusable. But those people reckoned without advances in network
technology, and the X-windows people claimed that they were
specifically designing for network technologies from several years in
the future, by which time their software technology would be mature
and ready to take advantage of the newer and higher bandwidths. And
they were pretty much right: having used X-windows over corporate WANs
since the early 1990s, I think it works pretty well.

But the X-windows people weren't designing for Internet scale: how
many connections should a server be able to handle?

> Serializing and sending document state will only make it slower.

Agreed: serialising and transmitting whole documents is taking it too far ;-)

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] [Fwd: Summer of Code preparation]

2006-04-18 Thread Alan Kennedy
[Titus Brown]
> I'm thinking of proposing a project to build a JavaScript interpreter
> interface for Python; the goal (for me) is to get twill/mechanize to
> understand JavaScript.  I think the project has wider applications,
> but I'm not sure what people actually want to do with JavaScript.
> I could imagine server-side parsing of javascript, and/or integration of
> javascript and python code.  Thoughts?

Have you looked at WebCleaner? WebCleaner is a filtering HTTP proxy,
written in python.

http://webcleaner.sourceforge.net/

WebCleaner uses the Mozilla SpiderMonkey javascript engine to execute
JS from web pages: From the webcleaner front page

"""
Another feature is the JavaScript filtering: JavaScript data is
executed in the integrated Spidermonkey JavaScript engine which is
also used by the Mozilla browser suite. This eliminates all JavaScript
obfuscation, popups, and document.write() stuff, but the other
JavaScript functions still work as usual.
"""

Perhaps webcleaner has code that already does what you need? Although
the GPL licensing might be problematic.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Standalone WSGI form framework.

2006-03-16 Thread Alan Kennedy
[Alan Kennedy]
> But I'm tired of hacking on it to make it do what I want: I'd much
> prefer to start afresh with my own design than to continue to use
> Quixote: it's just too limiting.

[Titus Brown]
> I think you mistook my question for a criticism ;).  Rewrite or no, I'm
> mostly interested in what you meant by "WSGI oriented" and what that
> would mean specifically in the context of the Quixote forms lib.

No criticism detected ;-)

By WSGI oriented, I mean that I don't have to mock request objects: I
can just use a dictionary to mock a WSGI request: I've found that
testing approach exceedingly straightforward to work with. Also, I've
had problems in the past with Quixote not handling response encodings
correctly. And it's html escaping mechanism is excessively PTL
oriented: I ended up making too many changes to Quixote, which made me
question why I was using it in the first place.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Standalone WSGI form framework.

2006-03-16 Thread Alan Kennedy
[Alan Kennedy]
> I'm looking for a framework-independent form library. I'm using the
> Quixote forms library at the moment, inside my own framework, but
> would ideally like something more WSGI oriented, so that it is easier
> to mock and unittest.

[Titus Brown]
> I'm confused by this -- this could mean that you want to separate the
> quixote forms lib from the Quixote 'request' object, I guess.  What
> else?

Hi Titus,

I realise that I can rewrite the Quixote form lib to achieve what I
want, but at the cost of a fairly significant effort.

As it is, I've rewritten the rendering, to work with Kid and ElementTree.

But I'm tired of hacking on it to make it do what I want: I'd much
prefer to start afresh with my own design than to continue to use
Quixote: it's just too limiting.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Standalone WSGI form framework.

2006-03-16 Thread Alan Kennedy
[Alan Kennedy]
> I'm looking for a framework-independent form library. I'm using the
> Quixote forms library at the moment, inside my own framework, but
> would ideally like something more WSGI oriented, so that it is easier
> to mock and unittest.

[Daniel Miller]
> Have you looked at Ian Bicking's FormEncode? I'm not sure if it
> meets all your requirements, but it seems like a good base to start
> with (most of the hard stuff has already been done).

Thanks Daniel.

Indeed, it not only appears that FormEncode is the closest thing to
what I need, it also seems to be the only show in town, i.e. the only
framework-independent form library.

[Alan Kennedy]
> If anyone is familiar with the Java Spring Framework, it's got pretty
> much everything I need, but is overly complex, and is written in Java

[Daniel Miller]
> I wrote an app using Spring and I have to say it's the best web
> framework I've ever used in terms of completeness and flexibility,
> but it's written in Java...

Agreed. I find it's interface based design very simple and powerful.
But, IMHO, the actual implementations of the classes that implement
the interfaces are excessively complex and rigidly structured.

[Daniel Miller]
> I actually wrote a few simple classes on top of CherryPy that exposes
> the Spring webmvc Controller interface as well as the
> SimpleFormController class (those are the two main building blocks
> I found most useful in Spring's WebMVC). My SimpleFormController
> implementation uses FormEncode for validation. I'd be willing to
> share the code if you're interested.

I'd be very interested to see that, and potentially use it, if you're
willing ...

[Daniel Miller]
> I think "the one true web framework" could be made for Python if
> someone took the best ideas from Spring WebMVC and made a few
> component-ized building blocks on top of which complex and widely
> varied applications could be built.

Completely agreed. The term "meta-framework" is most appropriate, I
think. If we could agree on a set of interfaces, then everyone would
be free to contribute implementations of their own componments.

For example, I like the idea of Routes URL-mapping library: it's
precisely the kind of task that is simple enough in concept, but yet
complex enough to require a dedicated (and thoroughly tested) library.

Most of the popular web frameworks make the fundamental mistake of
picking a single URL->object mapping mechanism, and making you
shoehorn all your requirements into it. IIRC, Django, Turbogears,
Pylons, all make this mistake.

However, if URL->object mapping were controlled by an interface, then
we'd be free to choose from multiple implementations, e.g.
routes-style, quixote-style, zope-style, etc, etc.

> However, to make this possible we'd most likely need a standard
> request object (or at least an interface definition).

ISTM that WSGI eliminates the need for that. Is there any specific
thing you have in mind that WSGI doesn't cover?

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


[Web-SIG] Standalone WSGI form framework.

2006-03-15 Thread Alan Kennedy
Greetings All.

I'm looking for a framework-independent form library. I'm using the
Quixote forms library at the moment, inside my own framework, but
would ideally like something more WSGI oriented, so that it is easier
to mock and unittest.

My ideal form framework should do the following

1. Parsing of submitted POST requests, etc
2. Binding of incoming form variables to the attributes of a target
python data object
3. Customisable validation, with management of validation error messages.
4. Generate unique (hierarchical) field names for sub-attributes of
the data object to be edited, which are javascript-identifier-safe,
i.e. can be used as the names of HTML form input elements.
5. Handle multipart/form-data
6. Nice-to-have: transparently handle multi-page forms, e.g. hub forms, etc.

It should NOT

1. Attempt to generate HTML, or be tied to a specific templating mechanism

If anyone is familiar with the Java Spring Framework, it's got pretty
much everything I need, but is overly complex, and is written in Java
:-(

TIA,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-19 Thread Alan Kennedy
[Alan Kennedy]
 >>>Maybe we need a PEP

[Bill Janssen]
 >>Great idea!  That's exactly what I thought when I organized this SIG a
 >>couple of years ago.

[Guido van Rossum]
 > At first I was going to respond "+1". But the fact that a couple of
 > years haven't led to much suggests that it's unlikely to be fruitful;
 > there just are too many diverging ideas on what is right. (Which makes
 > sense since it's a huge and fast developing field.)

Having considered the area for a couple of days, I think you're right: 
the generic concept "web", as in web-sig, covers far too much ground, 
and there are too many schools of thought.

 > So unless someone (Alan Kennedy?) actually puts forward a PEP and gets
 > it through a review of the major players on web-sig, I'm skeptical.

But there is a subset which I think is achievable, namely http support, 
which IMO is the subset that most needs a rework. And now that we have a 
nice web standard, WSGI, it would be nice to make use of it to refactor 
the current http support. The following are important omissions in the 
current stdlib.

  - Asynchronous http client/server support (use asyncore? twisted?)
  - SSL support in threaded http servers
  - Asynchronous SSL support
  - Simple client file upload support
  - HTTP header parsing support, e.g. language codes, quality lists, etc
  - Simple object publishing framework?

Addressing all of the above would be significant piece of work. And 
IMHO, it is only achievable by staying focussed on http and NOT 
addressing requirements such as

  - Content processing, e.g. html tidy, html parsing, css parsing
  - Foreign script language parsing or execution
  - Page templating API

I think it would be a good idea to address these concerns in separate PEPs.

[Guido van Rossum]
 > I certainly don't want this potential effort to keep us from adding
 > the low-hanging fruit (wsgiref, with perhaps some tweaks as PJE can
 > manage based on recent feedback here) to the 2.5 stdlib.

Completely agreed. Any web-related PEPs are going to take a long time, 
and are unlikely to be ready in time for 2.5.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-16 Thread Alan Kennedy
[Guido Van Rossum]
> Actually BaseHTTPServer.py and friends use a deprecated naming scheme
> -- just as StringIO, UserDict and many other fine standard library
> modules.
> If you read PEP 8, the current best practice is for module names to be
> all-lowercase and *different* from the class name.

[Clark C Evans]
> I propose we add wsgiref, but look at other implementations and
> steal what ever you can from them.  This is not a huge chunk of
> code -- no reason why you can't have the best combination of
> features and correctness.

[Jean Paul Calderone]
> HTTPS is orthogonal.  Besides, how would you support it in the stdlib?  It's 
> currently not > possible to write an SSL server in Python without a 
> third-party library.  Maybe someone
> would be interested in rectifying /that/? :)

[Ian Bicking]
> I've used this several times (well, not wsgiref's implementation, but
> paste.response.HeaderDict).  rfc822 is heavier than this dictionary-like
> object, and apparently is also deprecated.

[Alan Kennedy]
> While we're on the subject, can we find a better home for the HTTP
> status codes->messages mapping?

Folks,

Thinking about this some more, it's beginning to sound to me like the
server-side web support in the standard library needs a proper review
and possible rework: it's slowly decohering/kipplizing.

Maybe we need a PEP, so that we can all discuss the subject
(rationally ;-) and sort out all of the issues before we go ahead and
commit anything?

Just a thought. Feel free to dis-regard

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-16 Thread Alan Kennedy
[Ian Bicking]
> Anyway, I'm +1 on the object [wsgiref's wsgi header manipulation class]
> going somewhere.  I don't know if the
> parent package has to be named "wsgi" -- and "wsgiref" seems even
> stranger to me, as anything in the standard library isn't a "reference
> implementation" anymore, but an actual implementation.  I personally
> like a package name like "web".  Everyone will know what that means
> (though it would start with most of the web related modules not in it,
> which is a problem).

While we're on the subject, can we find a better home for the HTTP
status codes->messages mapping?

Integer status codes.
http://mail.python.org/pipermail/web-sig/2004-September/000764.html

Adding status code constants to httplib
http://mail.python.org/pipermail/web-sig/2004-September/000842.html

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-14 Thread Alan Kennedy
[Robert Brewer]
> Look at the right code and see if your gut feeling changes. ;)

I looked at

http://svn.cherrypy.org/trunk/cherrypy/_cphttpserver.py

As indicated by Ian in this message

http://mail.python.org/pipermail/web-sig/2006-February/002074.html

Sorry if that was the wrong one to look at, I'm not at all familiar with 
CherryPy.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-14 Thread Alan Kennedy
[Alan Kennedy]
>>Priority #1: Make the requisite server a single standalone module.

[Guido van Rossum]
> Huh? What makes you think this?

My bad :-(

Two things made me think like that

1. BaseHttpServer -> BaseHttpServer.py
SimpleHttpServer -> SimpleHttpServer.py
WSGIHttpServer -> WSGIHttpServer.py

2. The comment was more aimed at the CherryPy entry, which imports a 
fair amount of CherryPy support code.

i'll-get-me-coat-ly'yrs,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-14 Thread Alan Kennedy
[Guido van Rossum]
Let's make it so. I propose to add wsgiref to the standard library and
nothing more.

[Blake Winton]
>>>Will you be maintaining this?  ;)

[Guido van Rossum]
>>I'd expect we could twist Phillip's arm to maintain it; he's not
>>expecting much maintenance.

[Phillip J. Eby]
> Yes, and yes.

Whew! :-)

Phillip: Hope you don't mind me taking the liberty of rearranging your code?

And before we go finalising anything, please let's give the other 
contenders a chance to come up with something competitive.

Alan.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-14 Thread Alan Kennedy

[Alan Kennedy]
3. If I had to pick one of the 3 you suggested, I'd pick the 
last one, i.e. PJE's, because it fulfills exactly the criteria

I listed


[Robert Brewer]

I have to disagree (having examined/unraveled it quite a bit recently,
to remove modpython_gateway's dependency on it). 


[Ian Bicking]
I think it also tries to enforce a lot of the details of WSGI, and thus 
guide a WSGI implementor into creating a compliant server.  


Well, I'm sure we all want our favourite server in the stdlib ;-)

But a few things have to happen first.

Priority #1: Make the requisite server a single standalone module.

Anticipating PJE's willingness to have WSGIRef included in the stdlib, 
I've taken the liberty of putting it all into one big file. And I think 
it looks pretty damn good: fully WSGI compliant, with code to represent 
every single aspect of the spec. Take a look for yourself: the file is 
attached. If the attachment doesn't make it to the list, I'll upload it 
somewhere.


But that doesn't mean the decision's over. It means that the bar has 
been raised. Anyone else who wants their module to be a contender has to 
get it all into the one file, i.e. eliminating all framework 
dependencies, etc.


Here's a few comments I put together about the three contenders that 
have been proposed so far. They're just my own comments from reading the 
code: feel free to treat them as the ravings of a madman if you so wish.


1. CherryPy server - 407 lines (non-code lines: ~80)

 - Depends on cherrypy, cherryp._cputil, cherryp.lib.httptools
 - Depends on cherrypy.config
 - Implements HTTP header length limit checking
 - Implements HTTP body length limit checking
 - Uses own logging handler
 - Subclasses SocketServer.BaseServer, not BaseHTTPServer.HTTPServer
   - Therefore does low-level socket mucking-about
 - Provides 2 server implementations
   - CherryHTTPServer
   - PooledThreadServer
 - Explicitly checks for KeyboardInterrupt exceptions
 - PooledThreadServer has clean shutdown through Queue.Queue messaging
 - Does not detect hop-by-hop headers
 - No demo application

My gut feeling: too complex, works to hard to be "production-ready", at 
the expense of readability.


2. Paste Server - 450 lines

 - Supports 100 continue responses
 - No imports from outside stdlib
 - Provides HTTPS/SSL server, with fallback if no SSL
 - Supports socket timeout
 - Demo application is (imported) paste.wsgilib.dump_environ
 - Does not detect hop-by-hop headers

My gut feeling: Ignores many parts of the WSGI spec (sendfile, strict 
error checking), supports unnecessary stuff for stdlib, i.e. Continue 
support, HTTPS.


3. WSGIRef_onefile.py - 660 lines

 - No imports from outside stdlib
 - Detects hop-by-hop headers
 - Has WSGI sendfile support
 - Has dedicated class to manage WSGI headers list as dictionary
 - Has builtin demo app

My gut feeling: WSGIRef is the sweetspot in terms of simplicity vs. 
usability. Covers all aspects of WSGI (which is what it was designed 
for, IIRC ;-)


The ball's in yizzir court now..

Alan.

"""BaseHTTPServer that implements the Python WSGI protocol (PEP 333, rev 1.21)

This is both an example of how WSGI can be implemented, and a basis for running
simple web applications on a local machine, such as might be done when testing
or debugging an application.  It has not been reviewed for security issues,
however, and we strongly recommend that you use a "real" web server for
production use.

For example usage, see the 'if __name__=="__main__"' block at the end of the
module.  See also the BaseHTTPServer module docs for other API information.
"""

from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
import urllib, sys, os, mimetools, types, time

__version__ = "0.1"
__all__ = ['WSGIServer','WSGIRequestHandler','demo_app']

server_version = "WSGIServer/" + __version__
sys_version = "Python/" + sys.version.split()[0]
software_version = server_version + ' ' + sys_version

hop_by_hop_headers = {
'connection':1,
'keep-alive':1,
'proxy-authenticate':1,
'proxy-authorization':1,
'te':1,
'trailers':1,
'transfer-encoding':1,
'upgrade':1
}

def is_hop_by_hop(header_name):
"""Return true if 'header_name' is an HTTP/1.1 "Hop-by-Hop" header"""
return hop_by_hop_headers.has_key(header_name.lower())

class FileWrapper:
"""Wrapper to convert file-like objects to iterables"""

def __init__(self, filelike, blocksize=8192):
self.filelike = filelike
self.blocksize = blocksize
if hasattr(filelike,'close'):
self.close = filelike.close

def __g

Re: [Web-SIG] WSGI in standard library

2006-02-14 Thread Alan Kennedy
[Ian Bicking]
> Note that the scope of a WSGI server is very very limited.  It is quite 
> distinct from an XMLRPC server from that perspective -- an XMLRPC server 
> actually *does* something.  A WSGI server does nothing but delegate.

and

> I'm not set on "production" quality code, but I think the general 
> sentiment against that is entirely premature.  The implementations 
> brought up -- CherryPy's 
> (http://svn.cherrypy.org/trunk/cherrypy/_cphttpserver.py) and Paste's 
> (http://svn.pythonpaste.org/Paste/trunk/paste/httpserver.py) and 
> wsgiref's 
> (http://cvs.eby-sarna.com/wsgiref/src/wsgiref/simple_server.py?rev=1.2&view=markup)
>  
> are all pretty short.  It would be better to discuss the particulars. Is 
> there a code path in one or more of these servers which you think is 
> unneeded and problematic?

A few points.

1. My opinion is not relevant to whether/which WSGI server goes into the 
standard library. What's required is for someone to propose to 
python-dev that a particular WSGI server should go into the standard 
library. I imagine that the response on python-dev to the proposer is 
going to be along the lines of "Will you be maintaining this?" If/when 
python-dev is happy, then it'll go into the distribution.

2. What's wrong with leaving the current situation as-is, i.e. the 
available WSGI implementations are listed on the WSGI Moin page

http://wiki.python.org/moin/WSGIImplementations

3. If I had to pick one of the 3 you suggested, I'd pick the last one, 
i.e. PJE's, because it fulfills exactly the criteria I listed

  - It's pretty much the simplest possible implementation, meaning it's 
easiest to understand.
  - It's based on the existing *HttpServer hierarchy
  - It's got a big notice at the top saying """This is both an example 
of how WSGI can be implemented, and a basis for running simple web 
applications on a local machine, such as might be done when testing or 
debugging an application.  It has not been reviewed for security issues, 
however, and we strongly recommend that you use a "real" web server for 
production use."""

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Bowing out (was Re: A trivial template API counter-proposal)

2006-02-12 Thread Alan Kennedy
[Alan Kennedy]
>> Looking at this in an MVC context ...

[Phillip J. Eby]
> As soon as you start talking about what templates should or should not 
> do (as opposed to what they *already* do), you've stopped writing an 
> inclusive spec and have wandered off into evangelizing a particular 
> framework philosophy.

Sorry if my message seemed unreasonable. My approach to such matters is 
to attempt to start from best design practice, keeping a keen focus on 
the best way to do things in the future, relegating poorly-architected 
legacy systems, e.g. active page systems, to being a secondary concern.

Also, my take on active page systems is that they could easily be 
encompassed by an MVC model. The View is the active page, the Model is 
the namespace in which the active page is rendered and the Controller is 
the thing that does the rendering.

[Phillip J. Eby]
 > At this point it has become clear to me that even if I spent my days
 > and nights writing a compelling spec of what I'm proposing and then
 > trying to sell it to the Web SIG, it would be at best a 50/50 chance
 > of getting through, and in the process it appears that I'd be burning
 > through every bit of goodwill I might have previously possessed here.

 > .. I'd rather save whatever karma I
 > have left here for something with a better chance of success.

I'm sorry to hear that.

[Phillip J. Eby]
> Good luck with the spec.

Well, I'm currently designing and implementing a View and ViewResolver 
in Spring for a customer, so I'll be keeping a note of requirements as I 
go, and will attempt to come up with a generic design which is suitable 
for a a templating standard. But it will be a few weeks before I can 
spec that, document it and start doing sample implementations which I 
can open source.

Regards,

Alan.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-12 Thread Alan Kennedy
[Graham Dumpleton]
> Anyway, not that it matters, but the security fix was not the only thing
> in those releases.

Still, I think my point stands that internet-facing servers in the 
standard lilbrary are currently the only source of security advisories 
in python.

http://www.python.org/security/

How sure are we that any proposed production WSGI server in the standard 
library will not become a source of further holes, especially if it 
tries to cover all the bases of a true production server, i.e. security, 
flexibility, efficiency, full http1.1 compliance, etc?

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-12 Thread Alan Kennedy
[Alan Kennedy]
>>Instead, I think the right approach is to continue with the existing 
>>approach: put the most basic possible WSGI server in the standard 
>>library, for educational purposes only, and a warning that it shouldn't 
>>really be used for production purposes.

[Bill Janssen]
> I strongly disagree with this thinking.  Non-production code shouldn't
> go into the stdlib; instead, Alan's proposed module should go onto
> some pedagogical website somewhere with appropriate tutorial
> documentation.

I still disagree ;-)

IMO, the primary reason for not including production servers in the 
standard library is that servers need to be maintained much more 
fastidiously than the standard library, and need to be released on a 
timescale that is independent of python releases.

Note the security hole incovered in the standard library xml-rpc lib 
last year.

PSF-2005-001 - SimpleXMLRPCServer.py allows unrestricted traversal
http://www.python.org/security/PSF-2005-001/

This particular security hole is the very reason why the Python Security 
response team had to be founded, and required point-releases of the 
entire python distribution to fix, i.e. python 2.3.5 and python 2.4.1 
were released simply to fix this bug.

There are two primary areas of the python distro that can result in such 
significant security holes.

1. Crypto libraries. Fortunately, the Timbot has been carefully watching 
over us, and ensuring the excellence of the python crypto libraries (as 
witnessed by the appearance of Ron Rivest on python-dev (!) last December:

http://mail.python.org/pipermail/python-dev/2005-December/058850.html

2. Internet-exposed servers. No matter how careful developers are, it is 
very difficult to avoid designing security holes into such servers. 
Therefore, IMHO, it is wrong to include such servers into the standard 
distribution. Instead, production-ready servers should be independent of 
the standard distribution, have their own development teams, have 
independent release-cycles, etc, etc: think Twisted, mod_python, etc.

So, I still think that only basic servers educational/playpen servers 
should go in the standard library, with an indication that the user 
should pick an openly server from outside the distro if they require to 
do serious server work.

Maybe if there were no "production-ready" servers in the standard 
library, there would be no need for a "Python Security Response Team".

Just my €0,02.

Regards,

Alan.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A trivial template API counter-proposal

2006-02-05 Thread Alan Kennedy
[Phillip J. Eby]
 > Developing WSGI was not easy, either, as I'm sure you recall.You and I
 > certainly argued a bit about iterators and file-like objects and such,
 > and it took a while before I understood all of your use cases and we
 > were able to resolve them in the spec.  If you had given up on
 > convincing me then, or if I had given up on your use cases as "too
 > complex", the spec would have suffered for it.

And I am indeed most grateful that you took the time to understand my 
tiresome ramblings on the subject: WSGI is indeed a most excellent spec: 
well done! :-)

[Alan Kennedy]
 >> I can understand why the web-sig has fallen into the trap of tying a
 >> tmeplating API to its nice web standard, WSGI: all web applications 
 >> must generate output. But web apps need to generate a wide range of
 >> media types, e.g. image/*, application/msword, etc, etc, etc.

[Phillip J. Eby]
 > And in many frameworks, it is the *template* that decides what media
 > type it is generating - and it may not even be outputting text or
 > unicode.  Again, this is something that would be neglected by a
 > text-only spec.

Ah, now there I have a problem! IMHO, templates should generate only a 
single media type. Whatever code is managing resource-delivery to the 
browser should decide which template to use, and set the media type 
accordingly, outside of the template.

Let me explain in terms of an actual use case.

I used to work for an e-learning company (widelearning.com), which 
delivered multimedia financial training materials. As much as possible, 
the content was delivered as video and Macromedia Flash, with fallback 
to simple image/* and text/html if the multimedia plugins were not 
available.

This was done through two primary mechanisms:

1. Through plugin detection, i.e. running script in the browser to 
detect certain plugins, e.g. Flash.

2. Through user profiles, i.e. where the user selected their media 
preference, which was stored in a database.

In both scenarios, entirely different templating engines were used.

For text/*html, we used XSLT and JSP (ugh ;-)

For Flash, we used a bespoke templating system, akin to Macromedia 
Generator (something like jgenerator: http://www.jzox.com). This was a 
templating engine that took a binary template as input, "cooked" the 
template with reference to a user data namespace, and generated a binary 
output stream representing a personalised Flash "movie".

Neither rendering engine had any knowledge that other media types could 
potentially be returned to the user. Before any templates were rendered, 
a decision was made as to what media type was suitable to service the 
request, maximising the capabilities of the users browser, and the 
relevant rendering engine invoked, with the relevant template. This 
"separation of concerns" greatly simplified our development and QA process.

IMO, permitting templates to select the media type is akin to the old 
problem of dealing with exceptions in various templating languages which 
intermingle code and presentation, e.g. JSP, ASP, PHP, etc. If a JSP 
caused an exception halfway through page-rendering, it was too late to 
do anything meaningful about it: the first half of the rendered page had 
already been transmitted to the user. What should really have happened 
is that the page should not have been transmitted to the user until the 
template was completely successfully rendered. That way, if an exception 
occurred, a suitable error page could be returned to the user, and the 
half-cooked template response discarded.

Similarly, if a template is permitted to set HTTP headers, then it might 
discover too late that it is generating a media type that is unsuitable 
for the client.

IMHO, some functionality in the HTTP application should decide the media 
type to be returned, call the relevant templating engine and set the 
relevant HTTP headers.

Looking at this in an MVC context, the application is responsible for 
populating the Model (user namespace), and selecting which View 
(template<->media-type) is suitable for return to the user. Templates 
should not vary media types. HTTP headers do need to be set for 
different templates/media-types. But that should be the responsibility 
of the HTTP application, not the template, which should be unaware of 
the application contect in which it is running, except for the contents 
of the Model/user-namespace.

Regards,

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] A trivial template API counter-proposal

2006-02-05 Thread Alan Kennedy
[Guido van Rossum]
 > I see. But doesn't this still tie the templates API rather strongly to
 > a web API? What if I want to use a template engine to generate
 > spam^H^H^H^Hpersonalized emails? Or static HTML pages that are written
 > to the filesystem and then served by a classic Apache setup? Or source
 > code (program generators are fun toys!)?
 >
 > ISTM that a template that does anything besides producing the
 > *content* of a download would be a rather extreme exception -- e.g.
 > the decision to generate a redirect feels like *application* logic,
 > while templates ought to limit themselves to *presentation* logic, and
 > for that, an API that produces a string or writes to a stream seems to
 > make more sense. What am I missing? Are you proposing that I should
 > fake out the wsgi API and capture the strings passed to write() and
 > the sequence returned?

At last! A voice of sanity!

I've been dismayed over the last few days trying to follow the direction 
of this thread: it appears to me that something very simple has now 
become very complex.

Templating is about taking a pattern of bytes (the template), somehow 
mixing it with user data (the user context), and generating a series of 
bytes (the output). Full stop.

In relation to text, this means taking a textual template, with embedded 
code/processing-instructions/whatever, "cooking" it in a user namespace, 
delivering a final piece of output.

With text, the only major concern is with character encoding. And if I 
were designing a templating API, I'd make everything unicode-only, 
leaving the user responsible for transcoding to their desired encoding 
at serialisation time.

I can understand why the web-sig has fallen into the trap of tying a 
tmeplating API to its nice web standard, WSGI: all web applications must 
generate output. But web apps need to generate a wide range of media 
types, e.g. image/*, application/msword, etc, etc, etc.

This topic started with Buffet, the de-facto standard templating API for 
CherryPy. Buffet is just about textual templating, which is a good 
thing. That's why it's very simple, and is thus actually being used.

Perhaps web-sig is the wrong place to discuss a textual templating API. 
Maybe xml-sig would be a better place, or a new text-sig should be formed?

In relation to Guido's point above about usage scenarios for this API: 
I'm quite interested because I have a jython implementation of ZPT/TAL 
that I'll be open-sourcing in the coming weeks, and which I intend to 
make compatible with whatever API is produced by this current discussion.

I used that TAL implementation to generate the documentation for various 
things, usually just a flat set of HTML files in a directory: not a HTTP 
request in sight. Theoretically, I can envision a situation where I 
might want to swap TAL implementations for that offline generation 
process, meaning that it would be helpful to have a standardised API for 
controlling template cooking.

Why should I have to use WSGI in that scenario?

Just my €0,02.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] WSGI in standard library

2006-02-05 Thread Alan Kennedy
[Peter Hunt]
> I think CherryPy's WSGI server should go in: it's stable, and the 
> best-performing WSGI HTTP server out there.

I disagree.

I think that if a WSGI server is to go into the standard library, it 
should be the most basic one possible, e.g. one that builds on the 
*HttpServer.py hierarchy already there, and one that makes it as easy as 
possible for coders to understand how WSGI works.

HTTP servers can be complex beasts. Security is a major consideration, 
robustness and stability being next. Performance is also a major 
concern, with flexibility and ease-of-use being important as well.

That's too many concerns to balance against each other for a python 
library module.

Instead, I think the right approach is to continue with the existing 
approach: put the most basic possible WSGI server in the standard 
library, for educational purposes only, and a warning that it shouldn't 
really be used for production purposes.

The following quote is from the docstring of the CGIHTTPServer module

"""
In all cases, the implementation is intentionally naive -- all
requests are executed sychronously.

SECURITY WARNING: DON'T USE THIS CODE UNLESS YOU ARE INSIDE A FIREWALL
-- it may execute arbitrary Python code or external programs.
"""

And that's a good thing. If I really want to use python CGI, then I 
should find a robust HTTP server which supports it, e.g. Apache.

The same reasoning should apply to WSGI, IMHO.

Just another €0,02.

Regards,

Alan.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Standardized template API

2006-01-31 Thread Alan Kennedy
[Clark C. Evans]
> I'd stick with the notion of a "template_name" that is neither the
> template file nor the template body.  Then you'd want a template factory
> method that takes the name and produces the template body (complied if
> necessary).  

I agree.

If you're looking for an existing model (in java), the Spring framework 
has "View" objects (i.e. the V in MVC) and "View Resolver" objects. The 
latter resolve logical template names to actual templates, compiled if 
necessary.

View Interface
http://static.springframework.org/spring/docs/1.2.x/api/org/springframework/web/servlet/View.html

ViewResovler Interface
http://static.springframework.org/spring/docs/1.2.x/api/org/springframework/web/servlet/ViewResolver.html

> This way your template could be stored
> in-memory, on-disk, or in a database, or even remotely using an HTTP
> cashe.  The actual storage mechanism for the template source code should
> not be part of this interface.

A very important requirement IMHO.

Regards,

Alan.

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Communicating authenticated user information

2006-01-22 Thread Alan Kennedy
[Alan Kennedy]
>> I agree about not sending this information back to the user: it's
>> unnecessary and potentially dangerous.

[Phillip J. Eby]
> Yep, it would be really dangerous to let me know who I just logged in to 
> an application as.  I might find out who I really am! ;)

Very droll ;-)

What if other information, such as meta-information about the auth 
directory or database in which the credentials were looked up, was also 
communicated through X-headers, e.g. server connection details, etc.

Happy for that to go back to the user too?

If X-headers are to be used in WSGI, I think there should be something 
in the spec about whether or not they should be transmitted to the user.

Alan.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Communicating authenticated user information

2006-01-22 Thread Alan Kennedy
[Jim Fulton]
 >>>Is Zope the only WSGI application that performs authentication
 >>>itself?

[Phillip J. Eby]
 >>I think Zope is the only WSGI application that cares about
 >> communicating this information back to the web server's logs.  :)

[Jim Fulton]
 > I hope that's not true.  Certainly, if anyone else is doing
 > authentication in their applications or middleware, they
 > *should* care about getting information into the access logs.

Well, Apache records auth info in logs as well, and it seems like a 
perfectly reasonable thing for a server to do .

http://httpd.apache.org/docs/2.0/logs.html#accesslog

[Phillip J. Eby]
 >> Perhaps an "X-Authenticated-User: foo" header could be added
 >> in a future spec version?  (And as an optional feature in the
 >> current PEP.)

[Jim Fulton]
 > Perhaps. Note that it should be clear that this is soley for use
 > in the access log.  There should be no assumption that this is
 > a principal id or a login name.  It is really just a label for the
 > log.  To make this clearer, I'd use something like:
 > "X-Access-User-Label: foo".

Sending X-headers seems hacky, and results in unnecessary information 
being transmitted back to the user (possibly revealing sensitive 
information, or opening security holes?)

I think that the communication mechanism for auth information is 
possibly best served by a simple convention between auth middleware 
authors. Perhaps servers that are aware that auth middleware is in use 
can put a callable into the WSGI environment, which auth middleware 
calls when it has auth'ed the user?

[Phillip J. Eby]
 > This seems a simpler way to incorporate the feature than adding
 > an extension API to environ.

[Jim Fulton]
 > Why is that?  Isn't the env meant for communication between
 > the WSGI layers?  I'm not sure I'd want to send this information
 > back to the browser.

I think an API could be very simple, and optional for servers that know 
they won't be logging auth information.

I agree about not sending this information back to the user: it's 
unnecessary and potentially dangerous.

Regards,

Alan Kennedy.
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com