Re: [SAtalk] LDAP Storage instead of SQL Storage

Kristian Koehntopp Wed, 18 Jun 2003 00:06:17 -0700

On Tue, Jun 17, 2003 at 04:01:10PM -0700, Justin Mason wrote:
> One idea we've been thinking of for 2.70 or 3.0, was to use
> DBI for specifying database locations; in other words, just
> this.
> 
> DBI uses URI-style strings to specify formats, access methods,
> etc. along with the db names; if there's a DBI/DBD module to
> access LDAP, that would be best.  And it looks like there is:


Don't. LDAP is not SQL compatible. You can map LDAP to SQL
partially, to make LDAP stores available to a relational system,
but you cannot work with LDAP through any SQL interface and
expect performance, maintainability or even full functionality
(*1).


> But I'm not sure *writable* dbs like Bayes or AWL would ever
> be useful in LDAP; afaik, LDAP is very much read-frequently,
> write-seldom.

True. In any setup that does not replicate or where your
replication connections perform adequately, it may work, though.
I would discourage it, though.

Kristian

(*1) I have a rant (a set of slides in german even for a lecture
     I held) on comparing and mapping SQL and LDAP. The bottom
line is, that you do not want it, unless you absolutely need to.

LDAP is a data store that is not even 1st normal form when seen
from a relational POV (multivalued attributes are arrays in the
relational model, not scalar values). Also, data storage within
the LDAP tree is not in 2nd or 3rd normal form even if you
disregard multivalued attributes.

Also, you cannot fix that, because LDAP only implements a very
limited subset of the relational algebra's operation. Relation
algebra defined five operations, projection (selection of
columns), selection (selection of rows), join (cross products
over tables), renaming (needed for self joins) and aggregation
(grouping operations). 

Of these, LDAP only implements selection fully using filter
expressions. Projection is also available, but only in a limited
way (no synthetic columns/attributes in LDAP). A join is
completely unavailable, forcing you to use precomputed joins in
LDAP by using DNs in (often multivalued) attributes to refer to
other entries within the tree and implementing the dereferencing
operation manually within the client. Renaming is not supported
at all in LDAP. Also, aggregation operations are unavailable,
forcing you to count and group manually.

The missing join forces one to use DNs to refer to other nodes
within the tree. For example, LDAPs groupOfUniqueNames
objectclass uses the DNs to members in a multivalued attribute
to refer to other groupOfUniqueNames or actual members (often
Persons or subclasses thereof). Thus, DNs have the role that
primary keys have in relational databases. 

Unfortunately, unlike true primary keys, DNs are not opaque.
Instead, they are an ordered collection of RDNs, which are also
attributes of the data. Thus, if you change values of attributes
that are also RDNs, the DN of your object changes and all
references to your object are invalidated (changing primary keys
is frowned upon in the relational model for exactly this
reason). Since you cannot select or define your DN freely as you
can with primary keys in relational databases, you cannot get
rid completely of this problem.

Some server vendors, such as Netscape/iPlanet/SunOne, offer
server plugins that try to "fix" this: If you change a DN, the
plugin traverses your _entire_ data store and changes all
references to that DN (causing a replication storm, of course).

You can rid yourself of the problem of changing DNs partially by
defining a attribute "pk" which you are using to enumerate all
objects within your store, and make it a RDN. Unfortunately, you
still cannot define flat structures, because LDAP defines
replication only over subtrees. That is, if you want to
partition your data store, you have to structure it treelike
(introducing new attributes as RDNs) in order to have subtrees
to replicate.

In relational databases, there is no need to do this: You can
selection any column (in fact, often any predicate) to partition
a table into a set of tables, then replicate any subset of
tables. There is no need for the partitioning column (or
predicate) to redefine the primary key, the functionality is
orthogonal.

LDAP in fact maps much better to XML trees that it maps to SQL,
with nodes being elements and attributes being, well, attributes
- XML does not define multivalued attributes, though. Thus, the
LDAP query language can be compared to XPath or XQuery, exposing
even more flaws (well, opportunities for improvement) in the
definition of LDAP. For example, XPath defines a number of axes
along which a search can be performed - LDAP also defines axes,
but only three (called scopes: base, one, sub). Compare this to
the rich variety of axes that are available in XPath.

Also, the XPath query language allows you to form expressions
that take tree topology into consideration ("find me all <b/>
element directly below  <h1/> elements", "find me the title
attributes of all <answer/> elements anywhere below the
<chapter/> element that has a title attribute that contains the
literal text "about this FAQ"). There is no provision at all for
such queries within the LDAP query language.

The end result is that LDAP is a very limited data store that
maps to nothing properly. If you have to implement different
storage methods within an application, you cannot selection one
storage method and map all other onto that particular method and
expect it to work. Instead you'd have to model the storage
within your application in the terms of your application, and
then implement instances of that storage model using the
selected storage natively. This is in my experience from
projects generally less painful and produces much more robust
code.


I am sorry for posting such a long and somewhat offtopic rant to
this list, but this is stuff that has cost me about a year of so
to learn, the hard way, in several projects involving SQL and
LDAP data stores. Being a database person more that an LDAP
person, I tried to map LDAP to a relational model, just like you
do, and it blew up in several ugly ways. An analysis of the
reasons resulted in the above rant as the short version of an
explaination.

LDAP is a very limited and specialized data store, that is in
many ways inadequate, even for the things that it way "designed"
(*2) for. Don't force it do things it cannot be made to do.


(*2) Rant the second. LDAP was not designed at all. LDAP is the
     descendant of the parts of X.500 that could be salvaged.
X.500 is a 1970ies directory model that was designed by a
committee, and it was not even a committee of computer
scientists, but some working group composed of
telecommunications company executives and politicians working as
part of the CCITT/ITU. Like most CCITT/ITU/OSI inventions, it
lacks - read the soapboxes in Marshal T. Roses "The simple
book", if you want to know more about the ways in which
CCITT/ITU/OSI standards suck compared to RFCs, including the
abomination that is ASN.1 and BER/DER.



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] LDAP Storage instead of SQL Storage

Reply via email to