[sqlalchemy] ORM ramblings 2 - and announcing some o2r wrapper

samo Tue, 16 Jan 2007 13:28:05 -0800


ORM ramblings 2 - and announcing some o2r wrapper


howdy.
We needed an abstract-enough persistency and hence here is a wrapper
for SA to transparently bridge the O and R in ORM, and make the mapping
process as hidden as possible. That is, manually
control/request/specify the parameters of the O2R conversion and let
the wrapper do the rest.
The result wrapper would be open-sourced - if there is interest; don't
know where yet, do give suggestions.

A side goal is to prevent the DBMS/SQL-isation of the application - as
any framework/ ideology tends to grab the whole system into a
'there-must-be-only-One' way if allowed to, and in this case, the main
thing is persistency which is the DB, so SQL and related thinking tends
to creep everywhere - even in the way user interaction is done. (think
of it as hiding the implementation details for easier/correct
maintenance - e.g. why in "C" one would use 'typedef int Money; ...
Money amt;' instead of plain 'int amt;' )

And another goal is to have full systematical set of test cases to
cover the obj2db-Mapping/wrapping as such. SAlchemy tests are not
enough to me, they tend to be more like "lets just try this feature",
and not applied extensively in combinations of similar features/
subcases/ border-cases.

Initial idea was to even have different DB-drivers - SAlchemy being
just a one way to save data persistently. But this is too much
eventual, and too far away.

i know the impossibility of the completely automated solution - O and R
are two _different_ ways of representing reality and they do not match
(nor they cover whole reality). (see below the links for some article
on the theme).
Still, these paradigms cover more than enough common space, so there
should be a way to convert/bridge from one to another - assuming that
the application problem/field lies mostly inside the common shared
place/intersection, and not at the borders.

Of course, there come the trade-of's - maybe one cannot anymore do just
whatever one fancies; probably there will be some specific ways to do
certain things faster/shorter/smaller/better... But i am targetting the
bulk of other routine stuff, repeated over-and-over-and-over - could be
somewhat slower but easier.

In short, what is a (human) decision, _should_ be specified manualy,
and what is automatable, must be automated - and at proper level in the
chain.

Anyway, enough talking.

So far this is an abstraction - or extension/translator - of the orm/
layer that is in the SQLalchemy.
Maybe it looks a bit like activemapper but targets to cover/hide much
more of ORM job, and push the semantix somewhat away from the pure
relational.

---------

Here are the usage-areas of the wrapper (%-done):
=== layout of the data into tables: inheritance and
composition/referencing
  ---- inheritance (80%): declarative control of the way object-type
hierarchy is disassembled into (selec)tables and then assembled back
again.
The user specifies only type of decomposition into tables on a
per-class basis; the wrapper does all the rest - tables, columns, keys,
polymorhism, joins, mappers, etc. - and finaly presents for each class,
a triplet interface to query/split it's type-space:
query_ALL_instances, query_BASEonly_instances, query_SUB_instances
(q1=q2+q3)


Examples: mixed inheritance, both inside node / between nodes
( / is concrete table; // is table-inheritance; * is single table )
<pre>
      A
     / \\
    B   D
   / \\
  C   E

A.type = (A,D)
B.type = (B,E)

A.only = A.select( A.type==A)
B.only = B.select( B.type==B)
C.only = C
D.only = A.join(D)
E.only = B.join(E)




           A
          / \\
         B   D
        / *   *
       C   E   F
      //  //\
      P   Q  R
     *
    S

A.type = (A,D,F,Q)
B.type = (B,E,P,S)

A.only = A.select( A.type==A)
B.only = B.select( B.type==B)
C.only = C
D.only = A.join(D).select( A.type==D)
E.only = B.select( B.type==E)

F.only = A.join(D).select( A.type==F)

R.only = R

Q.only = A.join(D).select( A.type==Q).join(Q)

P.only = B.select( B.type==P).join(P)
S.only = B.select( B.type==S).join(P)

</pre>
the single_table case is not finished yet.

Here is example code for the first tree:
<pre>
   import samapper
   MBase = samapper.modelBaser( samapper._StaticStructPersistent)

   class A( MBase):
       a = Text()
   class B( A):
       DB_inheritance = 'concrete_table'
       b = Text()
   class C( B):
       DB_inheritance = 'concrete_table'
       c = Text()
   class E(C):
       DB_inheritance = 'table_inheritance'
       e = Text()
   class D( A):
       DB_inheritance = 'table_inheritance'
       d = Text()

   fieldtypemap = {
       Text: dict( type= samapper.sqlalchemy.String, ),
   }

   sam = SAmapper('memory')
   sam.open( recreate=True)
   sam.bind( locals(), fieldtypemap, base_klas= MBase)

   #populate
   a = A()
   a.a = 'a'
   b = B()
   b.a = 'ba'
   b.b = 'b'
   c = C()
   c.a = 'ca'
   c.b = 'cb'
   c.c = 'c'
   #...
   session = sam.session()
   sam.saveall( session, locals() )
   session.flush()

   q1 = sam.query_ALL_instances( session, klas=A)
   print [str(x) for x in q1]
   q2 = sam.query_BASE_instances(session, klas=A)
   print [str(x) for x in q2]
   q3 = sam.query_SUB_instances( session, klas=A)
   print [str(x) for x in q3]
</pre>

  ---- referencing (60%) and composition (0%): user specifies the
reference for the attribute; the wrapper does all the rest. This maybe
similar to what activemapper does with onetoone relations - i am not
sure.
       -- subcases: self-referencing (A-A); cycles (A-B-A) (A-B-C-A)
       -- a mincut-algo is used here to break the cycles (put
post_update, use_alter). graph calculations are done via kjbuckets.
       -- real composition is not handled yet. That is, embedding
structs in one another - so a.b.c.d maps actualy into a.b_c_d.

  ---- collections (5%): these are the set-like *any2many relations,
e.g. all Bs that point to me

=== field-type-conversion (50%): a simple translator of application
types into sql-alchemy types; similar to activemapper but using
application types in declarations.

=== queries (50%): using normal python expressions (put in functions)
over the application objects, which are then translated (e.g. into SA
column-expressions).
So far this covers the simple stuff only; uses same approach (operator
overloading) as SA but in somewhat different way. see expr.py.

=== make SA work with my static type-controlled objects (70%):
So far, it uses my own self-awareness layer to describe the object
types/fields in a declarative way - and keep them such. Thi is not
really required for the wrapper and the interface will be separated.

=== integrate all these together(30%): and make the whole thing usable
(incl. translating the comments into english)

=========================

general problems so far:
- SA's object-dirtiness-tracking and straight-overriding
instrumentedAttributes makes it hard/impossible to do type-control - it
overrides my own descriptors. The persistency-semantics also does not
match the need - SA undoing my own obj-changes is not something i
expect/want. i did explained this in our previous post (ORM-ramblings -
http://groups.google.com/group/sqlalchemy/browse_thread/thread/494eea6662dc38c6/).
Seems i'll need to have 2 separate object hierarchies, one (stiff) for
me and one (relaxed) for SA and somehow keep them synchronized. or keep
temporal copies. On SA-side, these can be done: a) separate
dirtiness-tracking as separate replaceable layer; and b) separate
access to SA-created "virtual" attributes (e.g. *_id) from usual
object-attributes.

- there are inconsistencies in SA interface/logic, which make the
generalizing (and testing) tricky. Some features work in certain
conditions/cases only, and do not work if combined with other features
(that is, different conditions). So far i've found a way
through/around, but i guess sooner or later i'll be stuck somewhere.

- it is useful for testing if all important dicts/sets are made via
some util.Dict/util.Set and not direct as {}/set(). This way by
replaceing that Dict with OrderedDict one can have repeatable
behaviour/output and compare it to a test-template. See my
sahack4repeatability.py about how i've workarounded this but i'm not
proud of it...

- i guess there are also plenty of ommisions / plain ignorance on my
side. e.g. what's common in between circular reference A-B-A,
one-to-one relation AB, backrefs, ... Bear with me, i dont do SQL for
lunch.

- i'm looking for a way to abstract/interface the stored procedures
somehow. Would there be support for these in SA?

- SA has no simple way of completely forgetting _all_ caches. i've
done something - see samapper.py: SAdb.destroy() - but i'm not sure if
it is enough/correct.

- i've found many cases of lists and dicts being created and
re-created /copied without need - be it with .copy() or with generator
expressions. Similarly same-attribute access is done multiple times.
Although noone cares if all this happens at DB-creation/opening, at
runtime (like session.flush()) all these will sooner or later start to
matter and slow down.


=================
Current problems/stucks:

* i cannot inherit a class and also reference it at same time (B
inherits A, and also references A; polymorphic). As i caught it not
working in many different ways, i've made a separate combinatorial
testcase for the full-combination of (polymorphism, inheritance-type,
Alink, Blink) - see my tests/sa_ref_A_B_A_all.py. i get several
different classes of errors:
- InvalidRequestError: Column 'B.id' is not available, due to
conflicting property 'id'
- NoSuchColumnError: Could not locate column in row for column 'A.i_d'
- SQLError: (OperationalError) no such column: A.data1
- and all else ok but a reference not being lazy-loaded.
This testcase is independent, it needs only sqlalchemy (same goes for
other sa_*.py)

* a (polymorphic) mapper of concrete table can not inherit the base
(polymorphic) mapper of another concrete table! i workaround this by
not inheriting but fix/add-to the polymorhic_map by hand - see "#XXX
?concrete-inheritance" comments in same test case. This workarounds
only in some cases; in the rest, One way i get the InvalidRequestError,
another way i get the NoSuchColumnError

* certain things involving inheritance/references (=mappers?) work
within session but stop working after session.clear(). This can be seen
on the above full-combination test-case - more cases fail with a
session.clear() (python tests/sa_ref_A_B_A_all.py clear) than without
(40%/28%). Even accessing some object after session.clear becomes a
problem (e.g. print a.b before or after session.clear() _does_ matter -
it stops in some cases).

* in the sqlalchemy/orm/attributes/InstrumentedAttribute.get(): the
"return obj.__dict__[self.key]" should be just 'return value'; a) it's
faster - it's not just a lookup; b) in my case the "set-attr" before
that with None will delete the attribute and then this one fails

* i've found a duplicated call to function -
sqlalchemy/topological.py: QueueDependencySorter.sort():
_find_cycles(); not really a problem except speed - it is used at
session.flush()

* it does matter if engine is bound or not before doing tables etc -
some cases work only if it is bound beforehand. i cannot remember, but
it came while playing around /extending the
sqlalchemy/examples/concrete.py

* the query_subclass_only_instances could not be done as a 3rd (2-nd
non-primary) mapper, unless force-given a pre-made _polymorphic_map.

Whether these are defects or it is just me not finding the right
alchemic word... dunno.

Here is the source, have a look... needs kjbuckets (from gadfly)
installed.

url: http://linuxteam.sistechnology.com/o2rm/o2rm0116.tar.bz2

ciao
samo

p.s. Regarding the 'square peg into round hole' - it is my choice.
i tell the machine what to do and not the machines telling me what to
do.

================

here the ORM links:

http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
(skip the first historical half)
http://programming.reddit.com/info/j009/comments

"""To grossly summarize:
OO objects != Relations, but current O/R mapping
schemas fail to understand that and assume the difference is only
trivial. This leads to the following problems:
The Object-Relational Impedence Mismatch
The Object-to-Table Mapping Problem
The Schema-Ownership Conflict
The Dual-Schema Problem
Entity Identity Issues
The Data Retrieval Mechansim Concern
The Partial-Object Problem and the Load-Time Paradox

Then suggests some possible solutions:
Abandonment (of objects)
Wholehearted acceptance (of objects)
Manual mapping
Acceptance of O/R-M limitations
Integration of relational concepts into the languages
Integration of relational concepts into frameworks
"""


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] ORM ramblings 2 - and announcing some o2r wrapper

Reply via email to