I have an application that uses multiple instances of web2py shells for long running data acquisition and analysis.
Having multiple independent processes is working really, really well but I'd like to reduce the total memory footprint in each of them. I'm not finding much in the way of really good memory profiling tools and wanted to check with the group wisdom here to see if anyone has already looked into reducing memory in shell processes. Below are some numbers from running a stripped out script in OS X (where I do most of my test and development work). In the first case, I run it outside web2py and see that the memory footprint is a little over 4Mb. Running the same script with web2py -S -N -M -R shows a footprint of 25Mb. The target environment is Linux. The memory use there is smaller by about 1/3 but the ratio is the same. I make use of the objgraph module ( http://mg.pov.lt/objgraph/objgraph.html) when I interrupt the process show the most common object types is memory just before the script exits. The number that really stands out is the count of functions. Without web2py there are 987, with web2py 5647!!. I kinda suspect I don't need all of them for the scripts I'm running, :-) , so my first thought is what can I get rid of at start up and how do I make python release the memory? The only part of web2py used by the shells is the application model, the basic db instance methods (insert, update, select, etc.) and the DAL() method to create a new db instance in forked children. Any help or suggestions appreciated. Thanks, Mike -------------------------------------------------------------------------- WITHOUT WEB2PY michael-elliss-macbook:web2py mellis$ python ~/bin/memcheck.py Memory usage: current = 4.3 MB, initial = 4.3 MB Memory usage: current = 4.3 MB, initial = 4.3 MB Memory usage: current = 4.3 MB, initial = 4.3 MB ^C wrapper_descriptor 1080 *function 987* builtin_function_or_method 661 tuple 552 method_descriptor 422 dict 415 weakref 284 member_descriptor 204 list 157 getset_descriptor 141 -------------------------------------------------------------------------- WITH WEB2PY michael-elliss-macbook:web2py mellis$ python web2py.py -S welcome -N -M -R ~/bin/memcheck.py web2py Web Framework Created by Massimo Di Pierro, Copyright 2007-2011 Version 1.97.1 (2011-06-26 19:25:44) Database drivers available: SQLite3, pymysql Memory usage: current = 25.1 MB, initial = 25.1 MB Memory usage: current = 25.1 MB, initial = 25.1 MB Memory usage: current = 25.1 MB, initial = 25.1 MB ^C *function 5647* tuple 1663 dict 1483 wrapper_descriptor 1245 weakref 1025 method_descriptor 938 builtin_function_or_method 875 list 834 type 604 getset_descriptor 495 -------------------------------------------------------------------------- HERE'S THE SCRIPT """ Web2py shell app for investigating memory use Author: Michael Ellis """ import signal DEBUGGER_INTERRUPT = False def debug_me(signum, frame): """ This signal handler provides a mechanism for remote debugging on demand. To activate it, look up the PID for this process, and then use 'kill -SIGUSR1 <PID>' to stop the process. Then start winpdb (or rpdb2), set the password and attach. """ global DEBUGGER_INTERRUPT try: import rpdb2 rpdb2.start_embedded_debugger("passwd") DEBUGGER_INTERRUPT = True except ImportError: print "Ignoring SIGUSR1. rpdb2 is not installed." signal.signal(signal.SIGUSR1, debug_me) import sys import time import datetime import resource from traceback import format_exc import gc def get_real_mem_mb(): """ Return current real memory use of calling process in MB. Useful for tracking slow memory leaks. See man getrusage for details of system call. """ if sys.platform.startswith('darwin'): divisor = 1024. * 1024 ## OS X reports bytes elif sys.platform.startswith('linux'): divisor = 1024. ## Linux reports kbytes else: msg = "Don't know what to do for platform %s. Please add a clause to handle it."\ % sys.platform raise ValueError(msg) mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss return mem / divisor class MemoryLeak(Exception): """Raised by MemoryUseTrackerinstances. Behaves like a generic exception. """ pass class MemoryUseTracker(object): """ Tracks memory use by current process """ def __init__(self, updateseconds = 60, leakalarm = 5.0): """ If the check() method is called less than updateseconds after the last call, no check is performed. The leakalarm argument specifies the factor by which memory use may increase before a MemoryLeak exception is raised. """ self.leakalarm = leakalarm self.updateseconds = updateseconds self.initialmb = get_real_mem_mb() self.firstcheck = self.lastcheck = time.time() def check(self): """ Checks memory usage and raises MemoryLeak if it is greater than self.leakalarm * self.initialmb Garbage collection is invoked followed by a second check before raising MemoryLeak. """ now = time.time() if now - self.lastcheck > self.updateseconds: self.lastcheck = now mb = get_real_mem_mb() print "Memory usage: "\ "current = %0.1f MB, initial = %0.1f MB"\ % (mb, self.initialmb) if mb / self.initialmb > self.leakalarm: ## Try garbage collection before raising an exception gc.collect() mb = get_real_mem_mb() else: return ## Check again if we get to here if mb / self.initialmb > self.leakalarm: elapsedhrs = (now - self.firstcheck) / 3600 msg = "Memory used exceeded alarm limit of "\ "%0.2f * initial value. "\ "Process started %0.2f hours ago" % (self.leakalarm, elapsedhrs) raise MemoryLeak(msg) def main_loop(): """ Run forever, sleeping and checking memory used. """ global DEBUGGER_INTERRUPT mem_use = MemoryUseTracker(updateseconds = 1) while True: try: time.sleep(1) mem_use.check() # gc.collect() ## commenting this in does not reduce memory use. except MemoryLeak: ## For now, just increase the limit, log the message ## and keep running. mem_use.leakalarm *= 1.1 tb = format_exc() warning(tb) continue except: if DEBUGGER_INTERRUPT == True: DEBUGGER_INTERRUPT = False continue else: from objgraph import show_most_common_types as show print show() break ## -------------------------------------------- if __name__ == "__main__": gc.set_debug(gc.DEBUG_UNCOLLECTABLE) #Use gc.DEBUG_LEAK for more verbosity main_loop()

