Re: [Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

glyph Fri, 18 Nov 2005 17:03:27 -0800

On Fri, 18 Nov 2005 16:47:02 -0500, "Clark C. Evans" <[EMAIL PROTECTED]> wrote:

Thank you for taking time to discuss this more.


Thanks.  I am trying to be more involved in Twisted's direction.  I've been 
auditing some code lately (not necessarily in web2 ;-)), and finding some good 
stuff and some unpleasant surprises.  The surprises stem from people trying to 
preserve my original design constraints without really understanding what they 
were for or what's going on, so I'm trying to be a bit more forceful and 
direct, rather than hanging back and not saying anything when I don't have time 
to code it myself.

I think I disagree that
twisted core currently is, or should be an object publishing system.


I'm sorry that you disagree, Clark, but that's the design :).  We thought for a 
while about taking the resource model out of twisted.web2 (we even had some 
real-life, face-to-face meetings about it!  with someone taking minutes and 
everything!), but if you take it out, there isn't any way to integrate with 
cred, and thereby the rest of Twisted.

Object publishing will remain the design.  You seem to have some misconceptions 
about what that means, though...

By an "object publishing system", I mean a system where every object
in the system is a Resource, and hence has a *unique* URL.


Lucky for you, that's not what *I* mean when I say "object publishing system".  
URLs are insufficiently descriptive anyway; things like the current time, cookies, and 
even random numbers can influence what object is present at a particular URL.  As you 
observe, sometimes it's dynamically calculated.

For starters, some objects in the system (such as a Session) by
default do not have a URL, and thus by definition, is not a
Published Object (aka a Resource).  But the current implementation
of web2 goes even further; it is possible for two distinct "Resource"
objects to have been accessed by the same URL (see web2.static.File,
which dynamically creates a Resource object for children).


As I've mentioned in a few previous posts, under guard, the SessionManager returns a 
Resource which corresponds to the current session.  Generally the path portion of that 
URL will simply be "/" for whatever server you're on.

Then, an IResource is defined as a _kind_ of request handler that eats
exactly one path segment


Only being able to eat exactly one path segment is a design error; there is no 
reason for any interface to support only one segment at a time, c.f. 
http://divmod.org/trac/wiki/PotatoProgramming

from the request; and it breaks handleRequest
into two cases:  (a) one that returns another IResource aka locateChild(),
or (b) one that returns a Response, aka render(). However, a IResource
is a very special kind of IRequestHandler -- one that respects the
uniqueness constraints of an object publshing system.


Your "IRequestHandler" abstraction breaks all kinds of useful patterns for 
cooperation between different chunks of web code, as far as I can tell.  In any case, 
concretely speaking, IRequestHandler sounds exactly like IResource minus the ability to 
distinguish between different handlers for different paths.  (And yes, the HTTP specs 
deal explicitly with path segments, the URI specs deal with them, and browsers implement 
HTML specifically to deal with them.  They're not some imagined thing on the part of 
twisted web.)

In this logic, an IAuthenticator is _not_ a resource, but rather a
IRequestHandler that does a bunch of checks; but otherwise largly
passes-through the request onto the next processing stage.


The equivalent IAuthenticator in the IResource model simply consumes no 
segments and defers to another resource.  All you're talking about is removing 
the ability to consume segments from the base API, making top level resources 
radically different from and incompatible with IResource objects which 
implement the bulk of existing, useful functionality in t.web and nevow.

In logical terms, the ISessionManager should associate each IRequest
with an ISession; you can then adapt(request,ISession) to obtain the
given session.  If the IRequest interface provides a short-cut for
this is really an implementation detail; but one with clear value.


Out of curiosity, what methods do you think ISession provides?

As far as the functionality concerned: SessionManager as a Resource would 
simply consume zero segments from locateChild, as I said above.

In summary, I think you're confusing arbitrary objects in the system
with Resources; and I think the web2 module is already overly-complicated
since it is addressing a higher level of abstraction than what is
absolutely required.   In my application, I do not have Resources
via the definition of an object publishing system -- nor do I want
to be burdened with this distinction.  I have my own URL processing
and I don't find the web2 concept of "segments" helpful.


At this point we may have to agree to disagree.  I don't find your URL 
processing helpful, either, and I feel that the Resource API has proven itself 
over the course of half a decade of my own, and many others', web work by now.  
Being able to consume multiple segments at once is an important feature, but 
it's been around in Nevow for quite some time now.

You can definitely implement your traversal scheme with the mechanisms provided 
in twisted.web2 at multiple levels, either on top of the base HTTP 
implementation or as a Resource, and if it works for you, please, be my guest.  
However, the point of twisted.* is not to address the absolute minimum required 
basis for your application.  Its purpose is to provide an integration 
framework, in the spirit of the various specifications that it implements.  
Your innovations might be neat, but, for example, /x=y/ setting variables is 
definitely outside the spirit of what the HTTP spec says.

In the context of nevow/web/web2, Resources seem to work to do this for quite a 
few people.

Following are specific comments related to the above...

(snip)

Each IRequest object has a member variable, 'peer', which is a mapping
from interfaces, such as IFoo onto the object that implements that
interface.  So, request.peer[ISession] will give me the session
associated with that request.  The appropriate __conform__ logic can
also be implemented so that adapt(peer,ISession) works.


That's the way that Nevow's session handling works and I think it's worked out 
pretty poorly.  It leads to the same kind of confusion as the context.  I would 
prefer to avoid repeating that mistake.

| I've gone through that message now and more thoroughly understood what is
| going on.  Those stages are interesting, but I don't think that any of
| them belong in twisted.web2.  Twisted's model of web interoperability is,
| and has always been, object publishing.  We aren't going to change that
| to a stage-based or filter-based scheme.

Assume for a moment that IRequestHandler is the basis for web2,
and that IResource layers on "object publishing" semantics.  Further
assume that the 'peer' attribute on each request maps interfaces
onto objects associated with that interface.


Now I'm assuming that I've somehow allowed two massive changes into Twisted for 
a benefit that I can't understand at all...

there is no reason why
I should be forced to layer my IStage on top of an IResource; my
stages are not resources.


In fact there are lots of good reasons.  The main one is that by layering 
IStage on top of IResource, you can defer back to other IResources easily, and 
it is clear to the resources what portion of the path they should be handling.  
Another is that someone else might want to have your Stage only apply to 
resources below a particular tree, let's say /cceapp/.

Are all objects resources?  If not, what must an object have to be
a resource.  If the answer is "implements IResource", then I ask
you, is a Session a resource?  If so, what does it's locateChild
look like?


nevow/guard.py lines 289 to 326. ;-)

Actually, that's slightly wrong.  The 'session' is a user-provided resource, 
whose locateChild does whatever they want.  The locateChild I'm referring to 
there does session management.

| Depending on session management policy
| the anonymous resource may or may not be shared between anonymous
| sessions.  It may *wrap* a resource which is common to all users, but the
| cred way of looking at an object is that each user has a distinct object
| they communicate with, which determines their view of the world.

Ok.  That's good, an Avatar; but is an Avatar an IResource?


"avatar" is a general term which means "implementation of protocol-specific 
interface which represents a session with a user, or the special anonymous 'user'".  In 
web-land it is generally an IResource, but as I said I am open to other suggestions, provided they 
come along with some *significant* benefit.

| (snip Resources should be self-contained)

Here is where we part ways.  This view of the the processing model
is an unnecessary restriction and should not be pladed upon web2.


Please name one way that using the convention of 'stages' being simply 
Resources which consume zero segments is 'restrictive'.

| >   (a) An Avatar is a "auto-generated" resource perhaps constructed
| >       from the SessionManager resource?
|
| That's the way guard works and should continue to work, yes.

An avatar is not a resrouce; if it is, what is it's URL?  What does it
look like (to phrase it with your definition)?


Its URL is "/"+(implicit modifications by cookies and server's interpretation 
of cookies)

Assuming a 'peers' collection; you only need to access the peers
that your RequestHandler (or IResource) knows or cares about.


I've worked with a couple of systems that worked that way, and that's generally 
not what happens.  People notice that 'peers' (as you're calling them) are 
handily available in some context they're working in, and start using them.  
Then they can't figure out how to write test cases for their own code becuase 
they don't know how that contextual information got set up.  Also, their model 
objects are totally broken without lots of implicit context from the 
web-rendering code path.  See also Zope's now-abandoned implicit acquisition 
for why this is bad.

| getSession is designed to bridge requests automatically from within the
| HTTP server's framework code, by setting cookies and such.  Session
| management is a task that should be accomplished by a resource object
| which can be independently tested, not by the server code.

No disagreement here.


Great!  I was waiting for one of those :).

| The proposed interface is something that would probably be *used* by a
| session-manager resource, and might even represent the session, but its
| purpose is simply to provide some per-request data that can be shared
| between resources processing the same request, without resorting to
| random attributes on the request, and with some way to link to the
| resource that provided that data.

It is not necessary to link data associated with a request with
the 'Resource' that provided the data.


I disagree.  The only reason to avoid providing this kind of information is if 
performance requirements dictate that it is too expensive.

| I suppose this doesn't make much difference.  I want it to be the
| resource because the accompanying URL should point to it, but I suppose
| that might be unnecessarily restrictive; at least the URL will point at
| the thing that set it.

Well, if you want to _expose_ a URL to the user for them to view
their session; then, it is indeed a Resource.  However, not all
sessions need to be Resources, no?


I don't understand what you mean by "session".  Broadly, your session is simply 
everything you could possibly access with the credentials and client-side state you have currently 
provided.  Perhaps you are talking about some smaller in-memory object which is an implementation 
detail of the session manager hooking your "session" in the broad sense to your web 
browser; in the guard sense these implementation details are hidden entirely from the user or the 
application programmer, and the visible session-object abstraction is the resource that the user is 
viewing.  Session-specific data can be attributes of 'self' on that resource, because presumably it 
affects the view, and then that same resource can access those attributes of 'self' and pass them 
to its children or render them in renderHTTP.

[snip more stuff about 'peer', I think I already addressed that enough times]

Does this top-most "resource" have a URL?  If not, then it
isn't a resource.  *poke*


Yes, the top-most resource is /.

| error-reporting behavior with Nevow

Ouch.  Is this good?


Noooo... it is exactly the thing I am trying to get away from.

(other case)

Wow.  Is this good?


Better than the Nevow case, at least.  I'm not a big fan of per-request state, 
I think this should be handled sparingly.

No way am I adding a /foo/ to my path to reflect that 'foo' logged-in;
or perhaps I didn't understand.


Nope, /foo/ is just some random application component that lives at that URL, 
which has child resources that depend on some implicit state it provides.


_______________________________________________
Twisted-web mailing list
[email protected]
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web

Re: [Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

Reply via email to