Re: script uses up all memory
Larry Martell larry.mart...@gmail.com writes: I figured out what is causing this. Each pass through the loop it does: self.tools = Tool.objects.filter(ip__isnull=False) And that is what is causing the memory consumption. If I move that outside the loop and just do that once the memory issue goes away. Now I need to figure out why this is happening and how to prevent it as they do want to query the db each pass through the loop in case it has been updated. Django saves a copy of every executed SQL query if it is in debug mode (if the DEBUG setting is true). See https://docs.djangoproject.com/en/dev/faq/models/#why-is-django-leaking-memory Regards, / Kent Engström, Lysator -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Wed, Mar 5, 2014 at 5:27 PM, Larry Martell larry.mart...@gmail.com wrote: I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 self.procs = {} self.startTimes = {} self.cmd = ['python', '/usr/local/motor/motor/app/some_other_script.py'] while True: try: self.tools = Tool.objects.filter(ip__isnull=False) except Exception, e: print 'error from django call: ' + str(e) sys.exit(1) for tool in self.tools: name = tool.name if name in self.procs: if self.procs[name].poll() is None: if (datetime.datetime.now()-self.startTimes[name]) datetime.timedelta(hours=12): # it's been running too long - kill it print 'killing script for ' + name + it's been running too long self.procs[name].kill() else: continue if self.procs[name].returncode: print 'scrikpt failed for ' + name + ', error = ' + str(self.procs[name].returncode) print 'starting script.py for ' + name + ' at ' + str(datetime.datetime.now()) try: self.procs[name] = subprocess.Popen(self.cmd) self.startTimes[name] = datetime.datetime.now() except Exception, e: print 'error from Popen: ' + str(e) sys.exit(1) else: print 'starting script.py for ' + name + ' at ' + str(datetime.datetime.now()) try: self.procs[name] = subprocess.Popen(self.cmd) self.startTimes[name] = datetime.datetime.now() except Exception, e: print 'error from Popen: ' + str(e) sys.exit(1) time.sleep(self.sleepTime) The script does what it's intended to do, however after about 2 hours it has used up all the memory available and the machine hangs. Can anyone see something that I am doing here that would be using memory like this? Perhaps some immutable object needs to be repeatedly recreated? I figured out what is causing this. Each pass through the loop it does: self.tools = Tool.objects.filter(ip__isnull=False) And that is what is causing the memory consumption. If I move that outside the loop and just do that once the memory issue goes away. Now I need to figure out why this is happening and how to prevent it as they do want to query the db each pass through the loop in case it has been updated. -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Thu, Mar 6, 2014 at 4:56 PM, Larry Martell larry.mart...@gmail.com wrote: On Wed, Mar 5, 2014 at 5:27 PM, Larry Martell larry.mart...@gmail.com wrote: I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 self.procs = {} self.startTimes = {} self.cmd = ['python', '/usr/local/motor/motor/app/some_other_script.py'] while True: try: self.tools = Tool.objects.filter(ip__isnull=False) except Exception, e: print 'error from django call: ' + str(e) sys.exit(1) for tool in self.tools: name = tool.name if name in self.procs: if self.procs[name].poll() is None: if (datetime.datetime.now()-self.startTimes[name]) datetime.timedelta(hours=12): # it's been running too long - kill it print 'killing script for ' + name + it's been running too long self.procs[name].kill() else: continue if self.procs[name].returncode: print 'scrikpt failed for ' + name + ', error = ' + str(self.procs[name].returncode) print 'starting script.py for ' + name + ' at ' + str(datetime.datetime.now()) try: self.procs[name] = subprocess.Popen(self.cmd) self.startTimes[name] = datetime.datetime.now() except Exception, e: print 'error from Popen: ' + str(e) sys.exit(1) else: print 'starting script.py for ' + name + ' at ' + str(datetime.datetime.now()) try: self.procs[name] = subprocess.Popen(self.cmd) self.startTimes[name] = datetime.datetime.now() except Exception, e: print 'error from Popen: ' + str(e) sys.exit(1) time.sleep(self.sleepTime) The script does what it's intended to do, however after about 2 hours it has used up all the memory available and the machine hangs. Can anyone see something that I am doing here that would be using memory like this? Perhaps some immutable object needs to be repeatedly recreated? I figured out what is causing this. Each pass through the loop it does: self.tools = Tool.objects.filter(ip__isnull=False) And that is what is causing the memory consumption. If I move that outside the loop and just do that once the memory issue goes away. Now I need to figure out why this is happening and how to prevent it as they do want to query the db each pass through the loop in case it has been updated. Apparently the object returned by that call is immutable as if I look at id(self.tools) each pass through the loop, it is different. Is there some way I can recover that memory? -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Fri, Mar 7, 2014 at 8:56 AM, Larry Martell larry.mart...@gmail.com wrote: I figured out what is causing this. Each pass through the loop it does: self.tools = Tool.objects.filter(ip__isnull=False) And that is what is causing the memory consumption. If I move that outside the loop and just do that once the memory issue goes away. Now I need to figure out why this is happening and how to prevent it as they do want to query the db each pass through the loop in case it has been updated. Interesting. So the next thing to do is to look into the implementation of that. Does it allocate database resources and not free them? Does it have internal reference loops? Something to try: Put an explicit gc.collect() call into the loop. If that solves your problem, you have a refloop somewhere (and you can properly fix it by explicitly breaking the loop). If that keeps returning large numbers, and especially if it populates gc.garbage with a whole lot of stuff, then you definitely have refloops. http://docs.python.org/2/library/gc.html ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Fri, Mar 7, 2014 at 9:07 AM, Larry Martell larry.mart...@gmail.com wrote: Apparently the object returned by that call is immutable as if I look at id(self.tools) each pass through the loop, it is different. Is there some way I can recover that memory? Not sure what mutability has to do with that. The changing id() simply means you're getting back a new object every time. Normally, as soon as your rebind self.tools to the new object, the old object will be disposed of - unless it has refloops, which is what I mentioned in the previous post, or has some other external reference. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Thu, Mar 6, 2014 at 5:11 PM, Chris Angelico ros...@gmail.com wrote: On Fri, Mar 7, 2014 at 8:56 AM, Larry Martell larry.mart...@gmail.com wrote: I figured out what is causing this. Each pass through the loop it does: self.tools = Tool.objects.filter(ip__isnull=False) And that is what is causing the memory consumption. If I move that outside the loop and just do that once the memory issue goes away. Now I need to figure out why this is happening and how to prevent it as they do want to query the db each pass through the loop in case it has been updated. Interesting. So the next thing to do is to look into the implementation of that. Does it allocate database resources and not free them? Does it have internal reference loops? Something to try: Put an explicit gc.collect() call into the loop. If that solves your problem, you have a refloop somewhere (and you can properly fix it by explicitly breaking the loop). If that keeps returning large numbers, and especially if it populates gc.garbage with a whole lot of stuff, then you definitely have refloops. http://docs.python.org/2/library/gc.html First I added del(self.tools) before the Django call. That did not stop the memory consumption. Then I added a call to gc.collect() after the del and that did solve it. gc.collect() returns 0 each time, so I'm going to declare victory and move on. No time to dig into the Django code. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
Chris Angelico ros...@gmail.com: Not all problems need to be solved perfectly :) But at very least, I would put a comment against your collect() call explaining what happens: that self.tools is involved in a refloop. Most Python code shouldn't have to call gc.collect(), so it's worth explaining why you are here. Refloops also are nothing to be avoided. Let GC do its job and forget about it. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Fri, Mar 7, 2014 at 9:21 AM, Larry Martell larry.mart...@gmail.com wrote: First I added del(self.tools) before the Django call. That did not stop the memory consumption. Then I added a call to gc.collect() after the del and that did solve it. gc.collect() returns 0 each time, so I'm going to declare victory and move on. No time to dig into the Django code. Thanks. Not all problems need to be solved perfectly :) But at very least, I would put a comment against your collect() call explaining what happens: that self.tools is involved in a refloop. Most Python code shouldn't have to call gc.collect(), so it's worth explaining why you are here. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Fri, Mar 7, 2014 at 9:34 AM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: Not all problems need to be solved perfectly :) But at very least, I would put a comment against your collect() call explaining what happens: that self.tools is involved in a refloop. Most Python code shouldn't have to call gc.collect(), so it's worth explaining why you are here. Refloops also are nothing to be avoided. Let GC do its job and forget about it. I think this thread is proof that they are to be avoided. The GC wasn't doing its job unless explicitly called on. The true solution is to break the refloop; the quick fix is to call gc.collect(). I stand by the recommendation to put an explanatory comment against the collect call. [1] ChrisA [1] Here in Australia, that should be gc.reverse_charges(). -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
Chris Angelico ros...@gmail.com: I think this thread is proof that they are to be avoided. The GC wasn't doing its job unless explicitly called on. The true solution is to break the refloop; the quick fix is to call gc.collect(). I stand by the recommendation to put an explanatory comment against the collect call. What I'm saying is that under most circumstances you shouldn't care if the memory consumption goes up and down. The true solution is to not do anything about temporary memory consumption. Also, you shouldn't worry about breaking circular references. That is also often almost impossible to accomplish as so much modern code builds on closures, which generate all kinds of circular references under the hood—for your benefit, or course. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Fri, Mar 7, 2014 at 10:12 AM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: I think this thread is proof that they are to be avoided. The GC wasn't doing its job unless explicitly called on. The true solution is to break the refloop; the quick fix is to call gc.collect(). I stand by the recommendation to put an explanatory comment against the collect call. What I'm saying is that under most circumstances you shouldn't care if the memory consumption goes up and down. The true solution is to not do anything about temporary memory consumption. Also, you shouldn't worry about breaking circular references. That is also often almost impossible to accomplish as so much modern code builds on closures, which generate all kinds of circular references under the hood—for your benefit, or course. This isn't a temporary issue, though - see the initial post. After two hours of five-minutely checks, the computer was wedged. That's a problem to be solved. Most of what I do with closures can't create refloops, because the function isn't referenced from inside itself. You'd need something like this: def foo(): x=1 y=lambda: (x,y) return y len([foo() for _ in range(1000)]) 1000 gc.collect() 4000 len([foo() for _ in range(1000)]) 1000 gc.collect() 4000 len([foo() for _ in range(1000)]) 1000 gc.collect() 4000 That's repeatably creating garbage. But change the function to not return itself, and there's no loop: def foo(): x=1 y=lambda: x return y gc.collect() 0 len([foo() for _ in range(1000)]) 1000 gc.collect() 0 len([foo() for _ in range(1000)]) 1000 gc.collect() 0 The only even reasonably common case that I can think of is a recursive nested function: def foo(x): def y(f,x=x): f() for _ in range(x): y(f,x-1) return y It's a function that returns a function that calls its argument some number of times, where the number is derived in a stupid way from the argument to the first function. The whole function is garbage, so it's not surprising that the GC has to collect it. len([foo(5) for _ in range(1000)]) 1000 gc.collect() 3135 len([foo(5) for _ in range(1000)]) 1000 gc.collect() 3135 len([foo(5) for _ in range(1000)]) 1000 gc.collect() 3135 Can you give a useful example of a closure that does create a refloop? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
Chris Angelico ros...@gmail.com: Can you give a useful example of a closure that does create a refloop? Just the other day, I mentioned the state pattern: class MyStateMachine: def __init__(self): sm = self class IDLE: def ding(self): sm.open_door() sm.state = AT_DOOR() class AT_DOOR: ... self.state = IDLE() def ding(self): self.state.ding() So we have: MyStateMachine instance - MyStateMachine instance.ding - IDLE instance - IDLE instance.ding - MyStateMachine instance plus numerous others in this example alone. In general, event-driven programming produces circular references left and right, and that might come into wider use with asyncio. I suspect generators might create circular references as well. Any tree data structure with parent references creates cycles. In fact, I would imagine most OO designs create a pretty tight mesh of back-and-forth references. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Fri, Mar 7, 2014 at 10:53 AM, Marko Rauhamaa ma...@pacujo.net wrote: Chris Angelico ros...@gmail.com: Can you give a useful example of a closure that does create a refloop? Just the other day, I mentioned the state pattern: class MyStateMachine: def __init__(self): sm = self class IDLE: def ding(self): sm.open_door() sm.state = AT_DOOR() Yeah, that's an extremely unusual way to do things. Why keep on instantiating objects when you could just reference functions? In general, event-driven programming produces circular references left and right, and that might come into wider use with asyncio. Nope; certainly not with closures. I do a whole lot of event-driven programming (usually in Pike rather than Python, but they work the same way in this), and there's no reference loop. Properly-done event-driven programming should have two basic states: a reference from some invisible thing that can trigger the event (eg a GUI widget) to a callable, and a reference from that callable to its state. Once the trigger is gone, the callable is dropped, its state is dropped, and everything's cleaned up. You don't usually need a reference inside the function to that function. Don't forget, a closure need only hang onto the things it actually uses. It doesn't need all its locals. I suspect generators might create circular references as well. I doubt it. def foo(x): return (x*i for i in range(x)) len([foo(5) for _ in range(1000)]) 1000 gc.collect() 0 len([foo(5) for _ in range(1000)]) 1000 gc.collect() 0 Again, unless it keeps a reference to itself, there's no loop. It'll need to hang onto some of its locals, but that's all. Any tree data structure with parent references creates cycles. Yes, but how many of those do you actually have and drop? If you create a GUI, you generally hold your entire widget tree stably. The only issue is if you create a parent-child subtree and then drop it. That shouldn't be being done in a tight loop. Most of the classic data structures like trees are implemented at the C level, so again, your code shouldn't be concerning itself with that. In fact, I would imagine most OO designs create a pretty tight mesh of back-and-forth references. Examples, please? I can think of a handful of situations where I've created reference loops, and they're sufficiently rare that I can put comments against them and explicitly break them. For instance, I have a Subwindow that has a Connection. My window can have multiple subwindows, a subwindow may or may not have a connection, and the connection always references its subwindow. The subw-connection-subw loop is explicitly broken when the connection is terminated. If the window chooses to drop a subw, it first checks if there's a connection (and prompts the user to confirm), and then will explicitly disconnect, which breaks the refloop (as the connection's terminated). I did a similar thing at work, again with explicit refloop breakage to ensure clean removal. Apart from those two cases, I can't think of anything in the last ten years where I've had a data structure with a loop in it, where the whole loop could be dropped. (My MUD has a loop, in that a character exists in a room, and the room keeps track of its contents; but it's not logical to drop a room with characters in it, and dropping a character is done by moving it to no-room, which breaks the refloop.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
Chris Angelico ros...@gmail.com: On Fri, Mar 7, 2014 at 10:53 AM, Marko Rauhamaa ma...@pacujo.net wrote: class MyStateMachine: def __init__(self): sm = self class IDLE: def ding(self): sm.open_door() sm.state = AT_DOOR() Yeah, that's an extremely unusual way to do things. Why keep on instantiating objects when you could just reference functions? That's not crucial. Even if the state objects were instantiated and inner classes not used, you'd get the same circularity: class State: def __init__(self, sm): self.sm = sm class Idle(State): def ding(self): self.sm.open_door() self.sm.state = self.sm.AT_DOOR class AtDoor(state): ... class MyStateMachine: def __init__(self): self.IDLE = Idle(self) self.AT_DOOR = AtDoor(self) ... self.state = self.IDLE The closure style is more concise and to the point and might perform no worse. Nope; certainly not with closures. I do a whole lot of event-driven programming (usually in Pike rather than Python, but they work the same way in this), and there's no reference loop. Properly-done event-driven programming should have two basic states: a reference from some invisible thing that can trigger the event (eg a GUI widget) to a callable, and a reference from that callable to its state. Once the trigger is gone, the callable is dropped, its state is dropped, and everything's cleaned up. You don't usually need a reference inside the function to that function. I'm more familiar with networking. If you need a timer, you need to be able to start it so you need a reference to it. Ok, maybe you instantiate a new timer each time, but you may need to cancel the timer so starting the timer gives you a ticket you can use for canceling. Similarly, you need a socket (wrapper) to signal an I/O state change, and you also need to be able to close the socket at a bare minimum. The task scheduling service (asyncio has one) collects thunks that refer to your objects and your objects have a reference to the task scheduling service to be able to schedule new tasks. Don't forget, a closure need only hang onto the things it actually uses. It doesn't need all its locals. More importantly, there's nothing bad in circularity. No need to avoid it. No need to cut cords. Marko -- https://mail.python.org/mailman/listinfo/python-list
script uses up all memory
I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 self.procs = {} self.startTimes = {} self.cmd = ['python', '/usr/local/motor/motor/app/some_other_script.py'] while True: try: self.tools = Tool.objects.filter(ip__isnull=False) except Exception, e: print 'error from django call: ' + str(e) sys.exit(1) for tool in self.tools: name = tool.name if name in self.procs: if self.procs[name].poll() is None: if (datetime.datetime.now()-self.startTimes[name]) datetime.timedelta(hours=12): # it's been running too long - kill it print 'killing script for ' + name + it's been running too long self.procs[name].kill() else: continue if self.procs[name].returncode: print 'scrikpt failed for ' + name + ', error = ' + str(self.procs[name].returncode) print 'starting script.py for ' + name + ' at ' + str(datetime.datetime.now()) try: self.procs[name] = subprocess.Popen(self.cmd) self.startTimes[name] = datetime.datetime.now() except Exception, e: print 'error from Popen: ' + str(e) sys.exit(1) else: print 'starting script.py for ' + name + ' at ' + str(datetime.datetime.now()) try: self.procs[name] = subprocess.Popen(self.cmd) self.startTimes[name] = datetime.datetime.now() except Exception, e: print 'error from Popen: ' + str(e) sys.exit(1) time.sleep(self.sleepTime) The script does what it's intended to do, however after about 2 hours it has used up all the memory available and the machine hangs. Can anyone see something that I am doing here that would be using memory like this? Perhaps some immutable object needs to be repeatedly recreated? -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Thu, Mar 6, 2014 at 9:27 AM, Larry Martell larry.mart...@gmail.com wrote: I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 That's not a stand-alone script. What environment is it running in? Can you reproduce the problem outside of that environment? Also: Can you simply use multiprocessing rather than going through all the effort of subprocess.Popen? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Wed, Mar 5, 2014 at 5:39 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 6, 2014 at 9:27 AM, Larry Martell larry.mart...@gmail.com wrote: I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 That's not a stand-alone script. No, that is just the part that does the work (inside the 'while true'). I'll try and post a standalone script tomorrow. What environment is it running in? CentOS 6,4 Can you reproduce the problem outside of that environment? I will try that tomorrow. Also: Can you simply use multiprocessing rather than going through all the effort of subprocess.Popen? Perhaps. I didn't write this. A client gave it to me and said 'figure out why it uses up all the memory and hangs.' I've messed around with for days and cannot see anything that would consume so much memory. -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Thu, Mar 6, 2014 at 11:20 AM, Larry Martell larry.mart...@gmail.com wrote: On Wed, Mar 5, 2014 at 5:39 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 6, 2014 at 9:27 AM, Larry Martell larry.mart...@gmail.com wrote: I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 That's not a stand-alone script. No, that is just the part that does the work (inside the 'while true'). I'll try and post a standalone script tomorrow. What environment is it running in? CentOS 6,4 That's not the whole environment, though. There's a mention of Django - does this run inside some framework? Can you reproduce the problem outside of that environment? I will try that tomorrow. Running as a stand-alone script, still under CentOS, would be what I mean by outside of that environment. I'm talking about making something that can be saved to my drive and executed, perhaps with a stubby subprocess script (eg import time; time.sleep(86400)). Also: Can you simply use multiprocessing rather than going through all the effort of subprocess.Popen? Perhaps. I didn't write this. A client gave it to me and said 'figure out why it uses up all the memory and hangs.' I've messed around with for days and cannot see anything that would consume so much memory. Ah. Yeah, that would be a fun little job to play with. My random thought: Do the subprocesses produce spammy log output? If so, the monitor might be collecting it all and holding it in memory in case you want it (not knowing that you don't). The default should be to leave them connected to your process's stdio streams, though, so that shouldn't be the issue. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: script uses up all memory
On Wed, Mar 5, 2014 at 7:33 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 6, 2014 at 11:20 AM, Larry Martell larry.mart...@gmail.com wrote: On Wed, Mar 5, 2014 at 5:39 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Mar 6, 2014 at 9:27 AM, Larry Martell larry.mart...@gmail.com wrote: I have a script that forks off other processes and attempts to manage them. Here is a stripped down version of the script: self.sleepTime = 300 That's not a stand-alone script. No, that is just the part that does the work (inside the 'while true'). I'll try and post a standalone script tomorrow. What environment is it running in? CentOS 6.4 That's not the whole environment, though. There's a mention of Django - does this run inside some framework? The system this is part of uses Django, and this script makes use of the django ORM, but it doesn't do any web stuff itself. It just kicks off another script once for each tool found in the database, and ensure that there's just one script pre tool running at a time, and that no single script runs too long. The django part can easily be removed. Can you reproduce the problem outside of that environment? I will try that tomorrow. Running as a stand-alone script, still under CentOS, would be what I mean by outside of that environment. I'm talking about making something that can be saved to my drive and executed, perhaps with a stubby subprocess script (eg import time; time.sleep(86400)). Yes, I understand what you mean. Also: Can you simply use multiprocessing rather than going through all the effort of subprocess.Popen? Perhaps. I didn't write this. A client gave it to me and said 'figure out why it uses up all the memory and hangs.' I've messed around with for days and cannot see anything that would consume so much memory. Ah. Yeah, that would be a fun little job to play with. My random thought: Do the subprocesses produce spammy log output? If so, the monitor might be collecting it all and holding it in memory in case you want it (not knowing that you don't). The default should be to leave them connected to your process's stdio streams, though, so that shouldn't be the issue. Ohh, that's a good random thought. I'll try that tomorrow and see if that's the issue. -- https://mail.python.org/mailman/listinfo/python-list