On 11/03/2014 08:50 AM, John Dennis wrote:
> I had a bunch of notes but I can't find them at the moment
> so I'm going from memory here (always a bit risky).

I found my notes, attached is an reStructured text document that
summarizes the issues I found with the current Keystone mapping, the
basic requirements for mapping based on my prior experience and reasons
for selecting the approach I did. I hope this explains the motivations
and rationale to put things into context.




-- 
John
Federation Mapping
==================

Introduction
------------

With federated authentication a trusted external IdP (Identity
Provider) authenticates an entity and provides attributes associated
with the authenticated entity such as full name, organization,
location, group membership, roles, etc. to the local authorization
system. The remote identity attributes usually do not directly
correlate to the local identity attributes as such they must be mapped
or transformed to be consistent with the local identity attributes.

Problems with existing contributed OpenStack federation mapping
---------------------------------------------------------------

A federated mapping implementation was contributed to OpenStack which
is based on static rules. However that mapping system lacks basic
features required in real world deployments. Examples of the
deficiencies are:


* Replacement substitutions are specified by an index, e.g. {0} making
  them difficult to use.

  * If you edit a rule all the indices shift and you have carefully
    adjust multiple locations, this is tedious and error prone.

  * Numeric replacements are not friendly, it's difficult to remember what
    index maps to which value. Likewise its difficult to read a rule
    with a indexed substitutions and understand what is being
    replaced, a number provides no context for the reader.

* Impossible to specify how strings containing lists of values are to
  be split.

* Any string containing a colon is automatically split irregardless of
  semantics of the string.

* Impossible to perform tests and perform conditional logic. For
  example you cannot test if the user is a member of a group and if so
  add a role based on the group membership. You cannot test if a list
  is empty, has a certain number of members, if a value starts with a
  prefix or ends with a suffix, etc.

* No mechanism to handle case sensitivity.

* Cannot split direct map value into multiple values, e.g. user@domain
  cannot be returned as ['user', 'domain'], i.e. {0}, {1}. This
  requires more robust regular expression support than is provided.

* Cannot operate on substrings or do replacements. Common operations
  such as replacing hyphens with underscores or stripping off prefixes
  or suffixes are impossible.

* Direct map values cannot be reassigned by later rules.
  e.g. email might be in assertion so it would be direct map,
  but if it was absent one can't synthesize it from other values
  in the assertion, i.e. user + @ + domain

* Difficult to build values by string interpolation or concatenation.

* The local array elements are dicts whose keys can contain 'user' or
  'group' or both, but you can't have more that one user or group in an
  element and elements that define a user more than once produce an
  error.

* Logical OR requires cut-n-paste copies of many rules with only a minor
  difference in each rule, rules quickly become unreadable and difficult
  to ascertain the logic.

* ``not_any_of`` does not work when regex is True, enabling regex
  effectively changes the condition to ``any_one_of``. (This is a bug
  that can be fixed)

Examples of real work tasks current mapping cannot handle
---------------------------------------------------------

I've worked with RADIUS for years. RADIUS is often configured to
operate in a federated mode where the attributes supplied to the
RADIUS server have to be manipulated to match local conventions and
policies. Below are some of the most common issues I've seen admins
have to tackle.

* Split the realm from the username. Assign the username and realm
  independently in the result. This is by far the most common issue
  admins raise.

* Strip prefixes and suffixes and/or take a prefix or suffix and map
  it to a new independent value. Believe it or not many organizations
  embed group/role information in their usernames. Usernames such as
  "johndoe_staff" where the username is "johndoe" and role is "staff"
  are depressingly common.

* Behave differently depending on the realm, the IdP, a DNS name or a
  network address.

* Test for membership in a collection. Examples are whitelists,
  blacklists, groups, etc. The result of the membership test modifies
  how the user is ultimately mapped and the privileges they receive.

* Search for and/or extract substrings, usually demands regular
  expression support. The result of the regular expression search may
  alter the flow of control.

* Filter certain characters.

* Convert to lower case or upper case.

* Test for empty or absent values.

* Build compound values from a series of conditions.


Mapping requirements
--------------------

The type of data transformations required by real world exchanges with
foreign identity systems demands the ability to make comparisons,
perform tests, assign values to variables, and call basic
transformation and regular expression functions.

Simple lookups on source values which are then copied over to
destination values are not sufficient.

Ideal Federation Mapping
------------------------

Federation mapping is simply transforming one set of attributes into
another set of attributes. JSON notation provides a rich yet simple
way to express a wide range of attributes and values.

Federation mapping should be based on a simple input/output filter
model. The mapper receives a JSON document containing the validated
assertions from an external IdP. The mapper examines the contents of
the JSON assertion and returns a JSON document of values mapped to the
local authorization environment or an error indicator if the mapping
cannot be performed.

Suggested solutions
-------------------

The two most viable ways to address the current deficiencies are:

* `Embedded Scripting Language`_
* `Enhanced Rule Mapping`_

Embedded Scripting Language
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Transforming one JSON document into another JSON document is trivial
for modern scripting languages. Rather than defining the mapping via a
set of rules a script is executed whose input is a JSON document of
assertion values and whose output is a JSON document of mapped
values. The script has full access to all the features of a modern
programming language as such there are virtually no limitations as to
what the script can do, the implementation can be clear and straight
forward (easy to read and understand). Usually a script will be much
simpler than enumerating a complex set of static rules. Scripts are
easier to debug than static rules. Most administrators have the
ability to program in a scripting language. Learning how to use a rule
based mapping system is often just as difficult as learning script
fundamentals but unlike script languages rule systems are not general
purpose. There is more value in learning a script language than a
one-off rule system.

Script based transformations could be implemented this way:

1. Embed the script interpreter into the running process.

2. Fork a script interpreter for each script evaluation.

3. Run a separate service implemented in the scripting language and
   pass the JSON documents via inter-process communication.

Option 1 (embedded) offers performance and ease of
implementation. Many modern script interpreters can easily be embedded
into programs with varying degrees of data exchange
mechanisms. ECMAScript (i.e. Javascript), Python and Perl are obvious
candidates [1]_

However the embedded option has one potential problem which deserves
serious consideration. For security reasons (and general robustness
reasons) you do not want the script to be able to access internal data
nor be able to execute operations in the context of a privileged
process. Running scripts provided by general users should always be
prohibited lest a security breach occur. However in this context the
mapping scripts are provided by trusted administrators with high
levels of privilege. We should be able to assume these scripts are
safe and not nefarious, the same administrators usually have enough
permission to do real damage irregardless of the script loading issue
so discounting the use of embedded scripts provided by administrators
does not have merit. Note that options 2 (fork) and 3 (service) do
not expose the primary process to the same potential security
breaches.


Option 2 (forking) is too heavy weight severely hampering throughput.

Option 3 (service) raises deployment issues, for instance a mechanism
that starts and stops the service, monitors it's health, locks down both
the communication and resources such that nothing is leaked and
nothing can be executed which shouldn't be. Some deployments are
concerned with the number of processes and services which are run.

Option 1 (embedded) seems the most viable.


Enhanced Rule Mapping
^^^^^^^^^^^^^^^^^^^^^

We've established real world attribute mapping requires the ability to
invoke basic transformations, perform tests then execute conditional
logic and build compound values from discreet values. Such operations
require the use of intermediate variables.

All of these requirements come for free in a scripting language. But
what if we don't want to use a script to perform a transformation and
prefer the rule based approach? Our goals in this case are:

* Provide the **minimal** set of functions that cover the **maximum**
  number of real world use cases.

* Keep it **simple**!. Simplicity aids users and makes the
  implementation easier to produce.

* Do **not** design a *language*. Any design that wanders off towards
  being a language is better replaced by an actual embedded language
  which is already fully implemented, debugged and familiar to users.

* Make everything self-consistent, no special case
  exceptions. Consistency aids both users and implementors.

* Works as in the ideal case, accepts a JSON assertion and emits a
  JSON mapping.


--------------------------------------------------------------------------------

.. [1] At the time of this writing this mapper must be implemented in
       both Python and Java.  The best supported embedded interpreters
       in Java today are ECMAScript (JavaScript) and Python (via
       Jython). Python also has support for loading an ECMAScript
       interpreter or the script language could be Python in which
       case the script would simply be evaluated in the context of the
       running process (however there is no effective sand-boxing when
       you eval a Python script inside Python).
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to