I have an application that  uses multiple instances of web2py shells for
long running data acquisition and analysis.

Having multiple independent processes is working really, really well but
I'd like to reduce the total memory footprint in each of them.   I'm not
finding much in the way of really good memory profiling tools and wanted to
check with the group wisdom here to see if anyone has already looked into
reducing memory in shell processes.

Below are some numbers from running a stripped out script in OS X (where I
do most of my test and development work).  In the first case, I run it
outside web2py and see that the memory footprint is a little over 4Mb.
Running the same script with web2py -S -N -M -R  shows a footprint of 25Mb.
 The target environment is Linux.  The memory use there is smaller by about
1/3 but the ratio is the same.

I make use of the objgraph module (
http://mg.pov.lt/objgraph/objgraph.html) when I interrupt the process
show the most common object types is memory
just before the script exits.  The number that really stands out is the
count of functions.  Without web2py there are 987,  with web2py 5647!!.  I
kinda suspect I don't need all of them for the scripts I'm running, :-) ,
so my first thought is what can I get rid of at start up and how do I make
python release the memory?

The only part of web2py used by the shells is the application model, the
basic db instance methods  (insert, update, select, etc.) and the DAL()
method to create a new db instance in forked children.

Any help or suggestions appreciated.
Thanks,
Mike

--------------------------------------------------------------------------
WITHOUT WEB2PY
michael-elliss-macbook:web2py mellis$ python ~/bin/memcheck.py
Memory usage: current = 4.3 MB, initial = 4.3 MB
Memory usage: current = 4.3 MB, initial = 4.3 MB
Memory usage: current = 4.3 MB, initial = 4.3 MB
^C
wrapper_descriptor         1080
*function                   987*
builtin_function_or_method 661
tuple                      552
method_descriptor          422
dict                       415
weakref                    284
member_descriptor          204
list                       157
getset_descriptor          141

--------------------------------------------------------------------------
WITH WEB2PY
michael-elliss-macbook:web2py mellis$ python web2py.py -S welcome -N -M -R
 ~/bin/memcheck.py
web2py Web Framework
Created by Massimo Di Pierro, Copyright 2007-2011
Version 1.97.1 (2011-06-26 19:25:44)
Database drivers available: SQLite3, pymysql
Memory usage: current = 25.1 MB, initial = 25.1 MB
Memory usage: current = 25.1 MB, initial = 25.1 MB
Memory usage: current = 25.1 MB, initial = 25.1 MB
^C
*function                   5647*
tuple                      1663
dict                       1483
wrapper_descriptor         1245
weakref                    1025
method_descriptor          938
builtin_function_or_method 875
list                       834
type                       604
getset_descriptor          495

--------------------------------------------------------------------------
HERE'S THE SCRIPT

"""
Web2py shell app for investigating memory use
Author: Michael Ellis
"""
import signal
DEBUGGER_INTERRUPT = False
def debug_me(signum, frame):
    """
    This signal handler provides a mechanism for remote debugging on demand.
    To activate it, look up the PID for this process, and then use
    'kill -SIGUSR1 <PID>'
    to stop the process.  Then start winpdb (or rpdb2), set the password
    and attach.
    """
    global DEBUGGER_INTERRUPT
    try:
        import rpdb2
        rpdb2.start_embedded_debugger("passwd")
        DEBUGGER_INTERRUPT = True
    except ImportError:
        print "Ignoring SIGUSR1. rpdb2 is not installed."


signal.signal(signal.SIGUSR1, debug_me)


import sys
import time
import datetime
import resource
from traceback import format_exc
import gc


def get_real_mem_mb():
    """
    Return current real memory use of calling process in MB.
    Useful for tracking slow memory leaks.  See man getrusage
    for details of system call.
    """
    if sys.platform.startswith('darwin'):
        divisor = 1024. * 1024    ## OS X reports bytes
    elif sys.platform.startswith('linux'):
        divisor = 1024.           ## Linux reports kbytes
    else:
        msg = "Don't know what to do for platform %s. Please add a clause
to handle it."\
              % sys.platform
        raise ValueError(msg)

    mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    return mem / divisor

class MemoryLeak(Exception):
    """Raised by MemoryUseTrackerinstances. Behaves like a generic
exception. """
    pass

class MemoryUseTracker(object):
    """ Tracks memory use by current process """
    def __init__(self, updateseconds = 60, leakalarm = 5.0):
        """
        If the check() method is called less than updateseconds
        after the last call, no check is performed.  The leakalarm
        argument specifies the factor by which memory use may increase
        before a MemoryLeak exception is raised.
        """

        self.leakalarm = leakalarm
        self.updateseconds = updateseconds
        self.initialmb = get_real_mem_mb()
        self.firstcheck = self.lastcheck = time.time()

    def check(self):
        """
        Checks memory usage and raises MemoryLeak if
        it is greater than self.leakalarm * self.initialmb
        Garbage collection is invoked followed by a second
        check before raising MemoryLeak.
        """
        now = time.time()
        if now - self.lastcheck > self.updateseconds:
            self.lastcheck = now
            mb = get_real_mem_mb()
            print "Memory usage: "\
                  "current = %0.1f MB, initial = %0.1f MB"\
                  % (mb, self.initialmb)
            if mb / self.initialmb > self.leakalarm:
                ## Try garbage collection before raising an exception
                gc.collect()
                mb = get_real_mem_mb()
            else:
                return
            ## Check again if we get to here
            if mb / self.initialmb > self.leakalarm:
                elapsedhrs = (now - self.firstcheck) / 3600
                msg = "Memory used exceeded alarm limit of "\
                "%0.2f * initial value. "\
                "Process started %0.2f hours ago" % (self.leakalarm,
elapsedhrs)
                raise MemoryLeak(msg)


def main_loop():
    """
    Run forever, sleeping and checking memory used.
    """

    global DEBUGGER_INTERRUPT


    mem_use = MemoryUseTracker(updateseconds = 1)
    while True:
        try:
            time.sleep(1)
            mem_use.check()
            # gc.collect()  ## commenting this in does not reduce memory
use.

        except MemoryLeak:
            ## For now, just increase the limit, log the message
            ## and keep running.
            mem_use.leakalarm *= 1.1
            tb = format_exc()
            warning(tb)
            continue

        except:
            if DEBUGGER_INTERRUPT == True:
                DEBUGGER_INTERRUPT = False
                continue
            else:
                from objgraph import show_most_common_types as show
                print
                show()
                break


## --------------------------------------------
if __name__ == "__main__":
    gc.set_debug(gc.DEBUG_UNCOLLECTABLE) #Use gc.DEBUG_LEAK for more
verbosity
    main_loop()

Reply via email to