Re: [sqlalchemy] best-practices question

Taavi Burns Tue, 30 Mar 2010 19:11:28 -0700

Jon Nelson wrote:
> Let's say I have a database with hundreds or even thousands of tables.

Sure you didn't mean "hundreds OF thousands"?  :)

On Tue, Mar 30, 2010 at 2:09 PM, Michael Bayer <[email protected]> wrote:
> def map_a_table(tablename):
>    table = Table(tablename, metadata,
>         ...put the standard set of columns here..
>
>    class Foo(object):
>        pass
>    Foo.__name__ = tablename
>    mapper(Foo, table)
>    return Foo
>
> someone else has the "thousands of the same tables" setup and also does
> this (though to be blunt their top priority is migrating off of that crazy
> architecture).

Yup, that's basically what we're doing, though we're using
declarative_base.  We've also got some relation()s between the
dynamically generated classes, and a cache of mapped classes.  Each
logical group of mapped classes inherits from one declarative_base and
share a metadata from it (this might have an impact on GCability, I
haven't personally tested any alternatives).

SQLAlchemy keeps only weakrefs to mappers, so once we expire the
mapped classes from our cache, they're eligible for Python to garbage
collect.  Obviously, we don't have any relation()s between groups of
mappers, or they wouldn't be collectable. :)  Letting the size of the
Python VM grow has horrible consequences for performance, so don't
make a cache any bigger than you absolutely have to.  On a Core2 Duo I
see GC times on the order of 1s/GB of VM memory, during which the
interpreter can't do anything else.  Mapping ~20 tables takes closer
to 1/5s, so it can be way better for latency to have a small cache and
cheap GC than the other way around.  One can also disable GC entirely
and restart the interpreter when it gets too big.  Or run GC when you
can, but be aware that the longer between GC runs, the longer they
will take.

For stability we've also had to backport some extra locking from 0.6
into 0.5, and grab the sqlalchemy.orm._COMPILE_MUTEX when dynamically
adding the relation()s.  There's also a fix for iterating over a
weakkeydict when GC hits (which can change the size of the dict,
throwing a RuntimeError) which I should probably submit upstream. ;)

And yes, one of our top priorities is migrating off of that schema as
quickly as we can, because maintaining this stuff is pure insanity!
Plus databases are designed for a few tables with millions of rows,
not millions of tables with a few rows.  In most cases you should just
let the database handle the load (with appropriate indexes).

-- 
taa
/*eof*/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] best-practices question

Reply via email to