-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris and I have been wrangling this afternoon over the way the Pyramid
security machinery interacts with SQL-based user / group stores.  As a
result, I did a little investigation, and wanted to share the results.

Rationale
- ---------

The current "stock" pyramid authentiation policies take a callback which
is used from two points in the framework / application:

- - The 'authenticated_userid' API calls the callback to determine
   whether a user with authenticated credentials still exists in the
  persistent store;  if the callback returns None, the user
  is no longer present.  Note that the pyramid framework itself does not
  call 'authenticated_userid', but most apps will end up doing so.  The
  API could also be called by stuff like view or route predicates.

- - The 'effective_principals' API calls the callback to compute the set
  of groups to which the user belongs.  Here, in addition to returning
  None for no-longer-in-the-building users, the callback is expected to
  return a sequence of "groups" (typically strings, but could be other
  stuff, depending on the application).  These groups are stitched into
  a sequence of princiapls, including also the userid and some special
  pseudo-users.

For developers who use SQL to store their user and group data, this
design pretty much mandates that four SQL requests get made for each
protected / authenticated request:  the framework will call the callback
once via 'effective_principals' (for any "protected" resource), while
the application will often call 'authenticated_userid' (maybe more than
once!)

Since the callback doesn't know which API called it, it has to do both
parts of the job, which in the examples documented to date requires
going over the wire for two SQL queries.

Ideally, the policy object would cache the result of the query(ies) on
the request.  Howver, the policy object cannot safely cache the results
returned from the callback on the request, because the contract allows
the callback to return arbitrary objects, some of which might themselves
hold references to the request (hmm, I guess it could at least cache the
fact that the callback returned None safely).   The callback might be
able to arrange such caching itself, if it knew that the results were safe.


Proposed Mitigation
- -------------------

To help mitigate that without changing the design of the API, I'd like
to propose that we document the following example SQL query, which
allows making a single query to answer both questions:

This query is made against a fairly typicial user-gruops schema, which I
bashed togehter using sqlite338::

- ------------------------- %< --------------------------------
sqlite> .dump
BEGIN TRANSACTION;
CREATE TABLE users(id integer primary key,
                   login varchar[256] unique,
                   password varchar[256]);
INSERT INTO "users" VALUES(1, 'phred', 'w1lm4');
INSERT INTO "users" VALUES(2, 'bharney', 'b3tty');
INSERT INTO "users" VALUES(3, 'pebbyls', 'd1n0');
INSERT INTO "users" VALUES(4, 'dhino', 'b0n3s');
INSERT INTO "users" VALUES(5, 'bambam', 'r0cks');
CREATE TABLE groups(id integer primary key,
                    name varchar[256] unique not null);
INSERT INTO "groups" VALUES(1, 'flintstones');
INSERT INTO "groups" VALUES(2, 'rubbles');
INSERT INTO "groups" VALUES(3, 'males');
INSERT INTO "groups" VALUES(4, 'females');
INSERT INTO "groups" VALUES(5, 'humans');
CREATE TABLE user_groups(user_id integer not null,
                         group_id integer not null,
                         primary key(user_id, group_id),
                         foreign key(user_id) references users,
                         foreign key(group_id) references groups);
INSERT INTO "user_groups" VALUES(1, 1);
INSERT INTO "user_groups" VALUES(2, 2);
INSERT INTO "user_groups" VALUES(1, 3);
INSERT INTO "user_groups" VALUES(1, 5);
INSERT INTO "user_groups" VALUES(2, 3);
INSERT INTO "user_groups" VALUES(2, 5);
INSERT INTO "user_groups" VALUES(3, 1);
INSERT INTO "user_groups" VALUES(3, 4);
INSERT INTO "user_groups" VALUES(3, 5);
INSERT INTO "user_groups" VALUES(5, 2);
INSERT INTO "user_groups" VALUES(5, 3);
INSERT INTO "user_groups" VALUES(5, 5);
COMMIT;
- ------------------------- %< --------------------------------

The query itself relies on left outer joins to ensure that valid user
records show up in the result set, even if they have no groups::

- ------------------------- %< --------------------------------
sqlite> select g.name
   ...>   from users u
   ...>    left outer join user_groups ug on ug.user_id = u.id
   ...>    left outer join groups g on ug.group_id = g.id
   ...>   where u.id = 1;
flintstones
males
humans
sqlite> select g.name
   ...>   from users u
   ...>    left outer join user_groups ug on ug.user_id = u.id
   ...>    left outer join groups g on ug.group_id = g.id
   ...>   where u.id = 4;

sqlite> select g.name
   ...>   from users u
   ...>    left outer join user_groups ug on ug.user_id = u.id
   ...>    left outer join groups g on ug.group_id = g.id
   ...>   where u.id =13;
sqlite>
- ------------------------- %< --------------------------------

The callback would issue this query, and return None if the result set
is empty.  If the result set has a single record, and the 'g.name' value
is Null, then the callback returns an empty sequence.  Otherwise, the
callback returns the sequence of group names in the result set.
Something like::

- ------------------------- %< --------------------------------
SQL = """\
select g.name
  from users u
    left outer join user_groups ug on ug.user_id = u.id
    left outer join groups g on ug.group_id = g.id
  where u.id = %
"""
def callback(request, userid):
    dbconn = request.registry.settings['dbconn']
    groups = [x.name for x in dbconn.execute(SQL, userid).fetchall()]
    if groups:
        if groups == [None]:
            return ()
        return groups
    return None
- ------------------------- %< --------------------------------

I realize that many folks expect their ORM to do the SQL heavy lifting
for them;  unfortunately, I don't know any ORM well enough to suggest
how to get this same effect.  Note that this approach also sacrifices
the convenience in the name of efficiency:  a callback implemented this
way doesn't make pre-populated "user object" available (or at least not
easily).


Summary
- -------

Cutting the number of SQL queries by half helps some.  Doing more
requires either changing the contract for the callback (mandating that
it return only items which can safely be cached on the request) or else
making the callback responsible for such caching.


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tsea...@palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0mW18ACgkQ+gerLs4ltQ54xwCgoTY+ajtC0EF6u8Bo/Wt+WYnV
BG4AnR6AVK+2SAhf/3zkUPKZ/91Avc7M
=quiW
-----END PGP SIGNATURE-----
_______________________________________________
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev

Reply via email to