Felix Schwarz wrote:
>
> I have a question which I think is similar enough to be asked
> in the same
> thread: I have a set of quite simple migration scripts which
> us SQLAlchemy 0.4
> and Elixir 0.4. I do extract data from the old legacy (MySQL)
> database with
> SQLAlchemy and put this data into new Elixir objects.
>
> Currently, these scripts use up to 600 MB RAM. This is no
> real problem as we
> probably could devote a machine with 4 GB ram solely for the
> automated
> migration. But it would be nice to use lower-powered machines
> for our migration
> tasks.
>
> What wonders me is that I do not (knowingly) keep references
> neither to the old
> data items nor the new elixir objects. Nevertheless memory
> usage increases
> during the migration. Is there any way to debug this easily
> to see why Python
> does need so much memory/which references prevent the objects
> from being garbage
> collected? Running the garbage collector manually did not
> help much (saving only
> about 5 MB).
>
> fs
>
Here is a snippet that I've used before when trying to track down
objects that aren't getting cleaned up properly. I don't think it'll
find leaks of built-in types, but it should help with instances of
user-defined classes. Just call 'report_objects' every now and then.
--------------------------------------------
import gc
_previous = {}
def report_objects(threshold=500):
objects = gc.get_objects()
print "Number of objects in memory: %d" % len(objects)
modules = {}
for obj in gc.get_objects():
if getattr(obj, '__module__', None) is not None:
module_parts = obj.__module__.split('.')
module = '.'.join(module_parts[:3])
modules.setdefault(module, 0)
modules[module] += 1
print "Modules with > %d objects:" % threshold
dump_modules(modules, threshold)
if _previous:
changes = {}
for module, value in modules.items():
changes[module] = value - _previous.get(module, 0)
print "Changes since last time:"
dump_modules(changes, 10)
_previous.clear()
_previous.update(modules)
print ""
def dump_modules(modules, threshold):
maxlen = max(len(m) for m in modules)
l = [(value, module) for module, value in modules.items()
if value > threshold]
if l:
l.sort(reverse=True)
for value, module in l:
print "%*s %5d" % (maxlen+1, module, value)
else:
print " <None>"
-------------------------------------------------
The first time you call report_objects, you should get something like
this:
Number of objects in memory: 100794
Modules with > 500 objects:
sqlalchemy.ext.assignmapper 1935
sqlalchemy.util 1362
sqlalchemy.types 1250
sqlalchemy.schema 1170
sqlalchemy.sql 1124
sqlalchemy.orm.unitofwork 1003
sqlalchemy.orm.strategies 956
sqlalchemy.orm.properties 750
sqlalchemy.orm.attributes 699
sqlalchemy.orm.mapper 681
testresults.define_schema 665
And then when you call it again some time later:
Number of objects in memory: 102349
Modules with > 500 objects:
sqlalchemy.ext.assignmapper 1935
sqlalchemy.util 1418
sqlalchemy.types 1250
sqlalchemy.schema 1204
sqlalchemy.sql 1177
sqlalchemy.orm.unitofwork 1004
sqlalchemy.orm.strategies 993
sqlalchemy.orm.properties 750
sqlalchemy.orm.attributes 708
sqlalchemy.orm.mapper 681
testresults.define_schema 665
Changes since last time:
sqlalchemy.util 56
sqlalchemy.sql 53
sqlalchemy.databases.mysql 49
MySQLdb.cursors 45
sqlalchemy.orm.strategies 37
sqlalchemy.schema 34
MySQLdb.connections 16
MySQLdb.converters 11
Note that the module names are where the classes are defined, not where
they are used, but it may be enough to give you a clue.
Hope that helps,
Simon
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---