[Repoze-dev] BFG and semi-structured databases (a rant)

Luciano Ramalho Sun, 02 May 2010 14:03:29 -0700

BACKGROUND

I was attracted to Zope in 1998 because it freed us from the
clumsiness of the first normal form.


Many in the Zope community can also boast membership of the NoSQL "old-guard".

The ZODB is great, but it, and all other OODBs, have a serious
problem: the data is tied too closely to the application.

Objects in a ZODB Data.fs instance cannot be retrieved unless the
classes that define them are in memory and in perfect sync. But the
classes are not in the same storage, so that is never guaranteed to
work. In an RDBMs the schema is part of the database, so they can
never be out of sync.

We lived with this for more than a decade, but then I learned: the
data is always more valuable than the application, so we need to be
able to get to it without the original software.

The Python community has always been smart about finding the middle
ground. For databases, between the extremes of RDBs and OODBs, I
believe the middle ground are semi-structured databases, as
exemplified by CouchDB, MongoDB, Google Datastore (sort of) and a host
of others that are gaining momentum, features, and cases.

In these, we don't store serialized objects, but just the data to
reconstruct the objects. But the data is not completely dismembered in
some normalized form.
In a semi-structured database the data graph can follow very closely
the original object graph, which makes retrieval easier for the
programmer and more efficient for the database. And the schema is
self-describing, which means if you have a database backup, then you
are able to get to the data even if you don't have the software that
put it there.

SEMI-STRUCTURED DATABASE MODEL

Here is a useful definition:

"""
The semi-structured data model is designed as an evolution of the
relational data model that allows the representation of data with a
flexible structure. Some items may have missing attributes, others may
have extra attributes, some items may have two ore more occurrences of
the same attribute. The type of an attribute is also flexible: it may
be an atomic value, or it may be another record or collection.
Moreover, collections may be heterogeneous, i.e., they may contain
items with different structures. The semi-structured data model is
self-describing data model, in which the data values and the schema
components co-exist. [1]
"""

"Self-describing" is a key. It is also interesting to note that the
text above seems like the description of the Python data model in
general.

Here is formal definition, from the same source:

"""
A semi-structured data instance is a rooted, directed graph in which
the edges carry labels representing schema components, and leaf nodes
(i.e., nodes without any
outgoing edges) are labeled with data values (integers, reals, strings, etc.).
"""

[1] M.T. Özsu and L. Liu, Encyclopedia of database systems : Springer, 2009.


THE OPPORTUNITY

I believe BFG, with its battle-tested traversal machinery, is uniquely
well positioned to take advantage of the wider adoption of
semi-structured databases.

A missing piece is a generic API for semi-structured data, to fill the
role that SQL Alchemy plays in the BFG ecosystem. Does anyone know
whether a good candidate for this already exists?

I am very happy that I took the time to visit Paul, Chris and Tres in
my last trip do Washington DC. Thanks for the hospitality, the book,
and inspiring ideas, guys.

I am excited about BFG and looking forward to the rest of this story.

Cheers,

Luciano
_______________________________________________
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev

[Repoze-dev] BFG and semi-structured databases (a rant)

Reply via email to