Re: [sqlalchemy] SQLAlchemy Memory

Michael Bayer Sun, 13 Jan 2013 14:20:42 -0800

I think if you truly want to measure unlimited memory growth, you need to watch 
the total memory of the process, under load, over time.    It will grow to a 
certain degree, then should stay constant.   In this case, there is more 
caching going on that will make it seem like memory is growing in the short 
term and it also describes why the Dialect and some expression related 
constructs are being held around for a bit.   The Dialect is a stateful object 
that accumulates information about a particular database URL, and is also used 
to cache information aobut various dialect-specific constructs like SQL 
statements and types.    The "many engine" use case might someday be helped by 
gaining the ability to share the same Dialect among many engines that are 
connecting to identically-behaving databases since its really the Dialect that 
has a larger memory footprint.


The attached test case illustrates the size management of another cache called 
mapper._compiled_cache which keys compiled forms of SQL statements to Dialects. 
  This cache is an LRU cache that manages its size around 150 items.    If you 
run the script, it prints these sizes which you'll see  grows to about 150 and 
then gets chopped down by 50.   The test also illustrates the size of 
gc.get_objects() which grows over the first few dozen iterations, but then 
levels out into a stepwise pattern:

sample gc sizes: [16939, 16997, 17045, 17093, 17141, 17189, 17237, 17285, 
17333, 17381, 17429, 17477, 17525, 17573, 17621, 17669, 17717, 17765, 17813, 
17861, 17909, 17957, 18005, 18053, 18101, 18149, 18197, 18245, 18293, 18341, 
18389, 18437, 18485, 18533, 18581, 18629, 18677, 18725, 18773, 18821, 18869, 
18917, 18965, 19013, 19061, 19109, 19157, 19205, 19253, 19301, 19349, 19397, 
19445, 19493, 19541, 19589, 19637, 19685, 19733, 19781, 19829, 19877, 19925, 
19973, 20021, 20069, 20117, 20165, 20213, 20261, 20309, 20357, 20405, 20453, 
20501, 20549, 20597, 20645, 20693, 20741, 20789, 20837, 20885, 20933, 20981, 
21029, 21077, 21125, 21173, 21221, 21269, 21317, 21365, 21413, 21461, 21509, 
21557, 21605, 21653, 21701, 21749, 21797, 21845, 21893, 21941, 21989, 22037, 
22085, 22133, 22181, 22229, 22277, 22325, 22373, 22421, 22469, 22517, 22565, 
22613, 22661, 22709, 22757, 22805, 22853, 22901, 22949, 22997, 23045, 23093, 
23141, 23189, 23237, 23285, 23333, 23381, 23429, 23477, 23525, 23573, 23621, 
23669, 23717, 23765, 23813, 23861, 23909, 23957, 24005, 24053, 21683, 21731, 
21779, 21827, 21875, 21923, 21971, 22019, 22067, 22115, 22163, 22211, 22259, 
22307, 22355, 22403, 22451, 22499, 22547, 22595, 22643, 22691, 22739, 22787, 
22835, 22883, 22931, 22979, 23027, 23075, 23123, 23171, 23219, 23267, 23315, 
23363, 23411, 23459, 23507, 23555, 23603, 23651, 23699, 23747, 23795, 23843, 
23891, 23939, 23987, 24035, 24083, 21683, 21731, 21779, 21827, 21875, 21923, 
21971, 22019, 22067, 22115, 22163, 22211, 22259, 22307, 22355, 22403, 22451, 
22499, 22547, 22595, 22643, 22691, 22739, 22787, 22835, 22883, 22931, 22979, 
23027, 23075, 23123, 23171, 23219, 23267, 23315, 23363, 23411, 23459, 23507, 
23555, 23603, 23651, 23699, 23747, 23795, 23843, 23891, 23939, 23987, 24035]

the number of objects gets to around 24000 then bounces down to 21000 again 
twice and continues to do so, as the LRU cache grows then reduces its size (the 
numbers are slightly different in 0.7 but same idea).  So this is not a leak.

If you have other cases to test you can send them back here by modifying the 
attached script to illustrate your scenario.










On Jan 13, 2013, at 11:24 AM, Anoop K wrote:

> I tried memory tests by reusing engine and memory is steady. I feel some 
> memory is held up 
> even after the release of engine.
> 
> On Sunday, 13 January 2013 12:21:10 UTC+5:30, Anoop K wrote:
> gc.collect() was not called in the last sample. Here is one where 
> gc.collect() is also called
> 
> 
> <class 'sqlalchemy.util._collections.OrderedDict'>: last:131 curr:134 diff:3
> <type 'builtin_function_or_method'>: last:862 curr:868 diff:6
> <class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: last:18 
> curr:21 diff:3
> <class 'sqlalchemy.types.NullType'>: last:19 curr:22 diff:3
> <type 'instancemethod'>: last:561 curr:564 diff:3
> <class 'sqlalchemy.types.Text'>: last:234 curr:264 diff:30
> <class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:180 curr:210 
> diff:30
>  <class 'sqlalchemy.sql.expression._BindParamClause'>: last:585 curr:678 
> diff:93
>  <class 'sqlalchemy.util._collections.PopulateDict'>: last:18 curr:21 diff:3
>  <type 'list'>: last:2962 curr:2983 diff:21
>  <type 'instance'>: last:197 curr:200 diff:3
>  <class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:18 
> curr:21 diff:3
>  <class 
> 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: 
> last:18 curr:21 diff:3
>  <type 'dict'>: last:10470 curr:10854 diff:384
>  <class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: 
> last:18 curr:21 diff:3
>  <class 'sqlalchemy.types.String'>: last:493 curr:550 diff:57
>  <class 'sqlalchemy.types.DateTime'>: last:53 curr:56 diff:3
>  <type 'weakref'>: last:2165 curr:2297 diff:132
>  <type 'tuple'>: last:29338 curr:29473 diff:135
>  <type 'cell'>: last:933 curr:1023 diff:90
>  <class 'sqlalchemy.types.Integer'>: last:57 curr:63 diff:6
>  <type 'function'>: last:8139 curr:8262 diff:123
> 
> 
> 
> 
> On Sunday, 13 January 2013 11:45:11 UTC+5:30, Michael Bayer wrote:
> 
> On Jan 13, 2013, at 12:40 AM, Anoop K wrote:
> 
>> Hi,
>> 
>> I had POSTed a 
>> question(https://groups.google.com/forum/?fromgroups=#!topic/sqlalchemy/vCjrGE9zYwQ)
>> related to session/engine cleanup.
>> 
>> After fixing the engine cleanup. I could see that engine and its associated 
>> objects are GCed properly. However 
>> process memory is gradually increasing. I am creating and destroying 
>> session/engine every 30 secs.
>> 
>> SQLAlchemy usage:
>> 
>> Every 30 sec following happens.
>> 
>> engine = create_engine(...)
>> sn = Session(bind=engine)...
>> try:
>>     sn.merge
>>     sn.commit
>> finally:
>>     sn.close()
>>     sn.bind.dispose()
>> >>> sn and engine don't have any reference after this point ....
>> 
>> 
>> GC objects after some runs are listed below, Looks like some items are 
>> increasing in memory(in BOLD).
>> Is this some kind of leak ?
> 
> First off, there's no way to know without a real, reproducing test case.   
> Providing this would give us something tangible to examine.
> 
> As the pattern of creating and destroying engines many times is not the 
> primary use case for the Engine, it's hard to say definitively, though we do 
> have memory tests which test the pattern of engines being created and 
> disposed many times, at the same time testing the length of gc.get_objects() 
> to confirm no growth in memory.     In your example below, the objects in 
> question are mostly type objects.  These objects do have a behavior in that 
> as created, they will cache information about themselves within a dialect 
> that is local to an Engine.   But as each Engine is disposed, so is the 
> Dialect, and so is the cache.   The cache itself is also a WeakKeyDictionary, 
> so that no strong reference to the type object is created.  So the cache does 
> not prevent the type objects from being garbage collected.   There are tests 
> which ensure this behavior as well.
> 
> In your counts below, there are 31 Dialect instances in one run and then 38 
> in another.  So this suggests that there are 31 and then 38 Engine instances, 
> a growth of 7, and that the Engine references are not in fact being cleaned 
> out, or gc.collect() is not being called within the test.   Depending on what 
> you're doing with SQL expressions within each engine it is plausible that a 
> few hundred type objects have accumulated in the cache.
> 
> Note that it is absolutely necessary to call gc.collect() in any Python 
> program that is attempting to get an accurate count of how many objects are 
> still strongly referenced, as unreachable cycles can remain for a significant 
> period of time if one waits for cyclic GC to happen automatically.
> 
> 
> 
>> 
>> <class 'sqlalchemy.util._collections.OrderedDict'>: last:176 curr:198 diff:22
>> <type 'builtin_function_or_method'>: last:900 curr:914 diff:14
>> <class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: 
>> last:31 curr:38 diff:7
>>  <class 'sqlalchemy.types.NullType'>: last:32 curr:39 diff:7
>>  <type 'instancemethod'>: last:605 curr:627 diff:22
>> <class 'sqlalchemy.types.Text'>: last:177 curr:254 diff:77
>>  <class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:120 
>> curr:197 diff:77
>> <class 'sqlalchemy.sql.expression._BindParamClause'>: last:658 curr:954 
>> diff:296
>>  <class 'sqlalchemy.util._collections.PopulateDict'>: last:60 curr:82 diff:22
>> <type 'list'>: last:3308 curr:3462 diff:154
>>  <type 'instance'>: last:139 curr:146 diff:7
>>  <class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:31 
>> curr:38 diff:7
>> <class 
>> 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: 
>> last:31 curr:38 diff:7
>> <type 'dict'>: last:10930 curr:12172 diff:1242
>> <class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: 
>> last:60 curr:82 diff:22
>> <class 'sqlalchemy.types.String'>: last:571 curr:753 diff:182
>>  <class 'sqlalchemy.types.DateTime'>: last:92 curr:113 diff:21
>> <type 'weakref'>: last:2204 curr:2582 diff:378
>>  <type 'tuple'>: last:29850 curr:30231 diff:381
>> <type 'cell'>: last:747 curr:978 diff:231
>> <class 'sqlalchemy.types.Integer'>: last:52 curr:59 diff:7
>> <type 'function'>: last:8077 curr:8392 diff:315
>> 
>> Thanks
>> Anoop
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msg/sqlalchemy/-/gESaFAd9DHkJ.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/sqlalchemy?hl=en.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/sqlalchemy/-/FPTcq6Jdf4AJ.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

On Jan 13, 2013, at 11:24 AM, Anoop K wrote:

I tried memory tests by reusing engine and memory is steady. I feel some memory is held up
even after the release of engine.

On Sunday, 13 January 2013 12:21:10 UTC+5:30, Anoop K wrote:
gc.collect() was not called in the last sample. Here is one where gc.collect() is also called

<class 'sqlalchemy.util._collections.OrderedDict'>: last:131 curr:134 diff:3
<type 'builtin_function_or_method'>: last:862 curr:868 diff:6
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: last:18 curr:21 diff:3
<class 'sqlalchemy.types.NullType'>: last:19 curr:22 diff:3
<type 'instancemethod'>: last:561 curr:564 diff:3
<class 'sqlalchemy.types.Text'>: last:234 curr:264 diff:30
<class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:180 curr:210 diff:30
<class 'sqlalchemy.sql._expression_._BindParamClause'>: last:585 curr:678 diff:93
<class 'sqlalchemy.util._collections.PopulateDict'>: last:18 curr:21 diff:3
<type 'list'>: last:2962 curr:2983 diff:21
<type 'instance'>: last:197 curr:200 diff:3
<class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:18 curr:21 diff:3
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: last:18 curr:21 diff:3
<type 'dict'>: last:10470 curr:10854 diff:384
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: last:18 curr:21 diff:3
<class 'sqlalchemy.types.String'>: last:493 curr:550 diff:57
<class 'sqlalchemy.types.DateTime'>: last:53 curr:56 diff:3
<type 'weakref'>: last:2165 curr:2297 diff:132
<type 'tuple'>: last:29338 curr:29473 diff:135
<type 'cell'>: last:933 curr:1023 diff:90
<class 'sqlalchemy.types.Integer'>: last:57 curr:63 diff:6
<type 'function'>: last:8139 curr:8262 diff:123

On Sunday, 13 January 2013 11:45:11 UTC+5:30, Michael Bayer wrote:

On Jan 13, 2013, at 12:40 AM, Anoop K wrote:

Hi,

I had POSTed a question(https://groups.google.com/forum/?fromgroups=#!topic/sqlalchemy/vCjrGE9zYwQ)
related to session/engine cleanup.

After fixing the engine cleanup. I could see that engine and its associated objects are GCed properly. However
process memory is gradually increasing. I am creating and destroying session/engine every 30 secs.

SQLAlchemy usage:

Every 30 sec following happens.

engine = create_engine(...)
sn = Session(bind=engine)...
try:
sn.merge
sn.commit
finally:
sn.close()
sn.bind.dispose()
>>> sn and engine don't have any reference after this point ....

GC objects after some runs are listed below, Looks like some items are increasing in memory(in BOLD).
Is this some kind of leak ?

First off, there's no way to know without a real, reproducing test case. Providing this would give us something tangible to examine.

As the pattern of creating and destroying engines many times is not the primary use case for the Engine, it's hard to say definitively, though we do have memory tests which test the pattern of engines being created and disposed many times, at the same time testing the length of gc.get_objects() to confirm no growth in memory. In your example below, the objects in question are mostly type objects. These objects do have a behavior in that as created, they will cache information about themselves within a dialect that is local to an Engine. But as each Engine is disposed, so is the Dialect, and so is the cache. The cache itself is also a WeakKeyDictionary, so that no strong reference to the type object is created. So the cache does not prevent the type objects from being garbage collected. There are tests which ensure this behavior as well.

In your counts below, there are 31 Dialect instances in one run and then 38 in another. So this suggests that there are 31 and then 38 Engine instances, a growth of 7, and that the Engine references are not in fact being cleaned out, or gc.collect() is not being called within the test. Depending on what you're doing with SQL expressions within each engine it is plausible that a few hundred type objects have accumulated in the cache.

Note that it is absolutely necessary to call gc.collect() in any Python program that is attempting to get an accurate count of how many objects are still strongly referenced, as unreachable cycles can remain for a significant period of time if one waits for cyclic GC to happen automatically.

<class 'sqlalchemy.util._collections.OrderedDict'>: last:176 curr:198 diff:22
<type 'builtin_function_or_method'>: last:900 curr:914 diff:14
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2'>: last:31 curr:38 diff:7
<class 'sqlalchemy.types.NullType'>: last:32 curr:39 diff:7
<type 'instancemethod'>: last:605 curr:627 diff:22
<class 'sqlalchemy.types.Text'>: last:177 curr:254 diff:77
<class 'sqlalchemy.dialects.postgresql.psycopg2._PGArray'>: last:120 curr:197 diff:77
<class 'sqlalchemy.sql._expression_._BindParamClause'>: last:658 curr:954 diff:296
<class 'sqlalchemy.util._collections.PopulateDict'>: last:60 curr:82 diff:22
<type 'list'>: last:3308 curr:3462 diff:154
<type 'instance'>: last:139 curr:146 diff:7
<class 'sqlalchemy.dialects.postgresql.base.PGTypeCompiler'>: last:31 curr:38 diff:7
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGIdentifierPreparer_psycopg2'>: last:31 curr:38 diff:7
<type 'dict'>: last:10930 curr:12172 diff:1242
<class 'sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2'>: last:60 curr:82 diff:22
<class 'sqlalchemy.types.String'>: last:571 curr:753 diff:182
<class 'sqlalchemy.types.DateTime'>: last:92 curr:113 diff:21
<type 'weakref'>: last:2204 curr:2582 diff:378
<type 'tuple'>: last:29850 curr:30231 diff:381
<type 'cell'>: last:747 curr:978 diff:231
<class 'sqlalchemy.types.Integer'>: last:52 curr:59 diff:7
<type 'function'>: last:8077 curr:8392 diff:315

Thanks
Anoop

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/gESaFAd9DHkJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/FPTcq6Jdf4AJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected].
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

import gc
import weakref

def profile_memory(times=50):
    def decorate(func):
        # run the test 50 times.  if length of gc.get_objects()
        # keeps growing, assert false

        def get_objects_skipping_sqlite_issue():
            # pysqlite keeps adding weakref objects which only
            # get reset after 220 iterations, which is too long
            # to run lots of these tests, so just filter them
            # out.
            return [o for o in gc.get_objects()
                    if not isinstance(o, weakref.ref)]

        def profile(*args):
            gc.collect()
            samples = [0 for x in range(0, times)]
            for x in range(0, times):
                func(*args)
                gc.collect()
                samples[x] = len(get_objects_skipping_sqlite_issue())

            print "sample gc sizes:", samples

            for x in samples[-4:]:
                if x != samples[-5]:
                    flatline = False
                    break
            else:
                flatline = True

            # object count is bigger than when it started
            if not flatline and samples[-1] > samples[0]:
                for x in samples[1:-2]:
                    # see if a spike bigger than the endpoint exists
                    if x > samples[-1]:
                        break
                else:
                    assert False, repr(samples) + " " + repr(flatline)

        return profile
    return decorate


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class A(Base):
    __tablename__ = "a"

    id = Column(Integer, primary_key=True)
    data = Column(String)

engine = create_engine("postgresql://scott:tiger@localhost/test")

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
s = Session(engine)
s.add(A(data='a1', id=1))
s.commit()

i = [2]

@profile_memory(times=250)
def go():
    engine = create_engine("postgresql://scott:tiger@localhost/test")

    s = Session(engine)
    s.merge(A(data='a%d' % i[0], id=1))
    i[0] += 1
    s.commit()
    s.close()

    engine.dispose()
    del engine


    print "Size of compiled cache: %d" % len(A.__mapper__._compiled_cache)
go()

Re: [sqlalchemy] SQLAlchemy Memory

Reply via email to