Re: [chromium-dev] Re: Running Chromebot on official builds from trunk
Driveby [possibly irrelevant] comment: I committed a patch to trunk after the beta branch that changes the code path for reaping processes, you might want to try to repro this on the beta branch and then try on trunk. On Thu, Dec 31, 2009 at 7:33 AM, Mark Mentovai m...@chromium.org wrote: Debugging tips: 1. What's the parent pid of your zombies? Is it a browser process or something else? 2. Temporarily move /bin/ps to /bin/ps.real, and at /bin/ps put a small script that writes the full argument list and maybe other debuggery to a log file somewhere, and then invokes /bin/ps.real. For example: #!/bin/sh (date ; echo ${$} ${PPID} $...@}) /tmp/pslog.${EUID} exec ${0}.real $...@} Then, using the logged data, you'll see what arguments ps is being invoked with, and you'll be able to look to see where we make those ps calls in our own code (if it's even happening in our own code). 3. Try to reproduce it in a developer build instead of a released official build. You'll need to reproduce it to know if you've fixed it once you think you've figured it out. h-n-y, Mark -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] Re: Running Chromebot on official builds from trunk
Thomas Van Lenten wrote: Where are the profile dirs going on disk? Time machine doesn't have to be on, we can add things to this file incase it is turned on. (and again. the multiple-address interface for mailing lists kind of sucks.) -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] Re: Running Chromebot on official builds from trunk
Your log seems fine so it's probably not the Avi/Trung/breakpad bug. I'm not aware of any case where Chrome launches 'ps' other than about:memory (Mac only). Some quick tests with about:memory on 249.30 do not create ps zombies. I could not see any obvious path in base/process_util_posix.cc's GetAppOutputInternal() which would not hit the wait() call. Nirnimesh, are you using about:memory? Do you know of a trigger for 'ps' process creation or zombification? The default max procs per userid on 10.5.8 is 266. With ~20 Chromes and 10 tabs each you're starting to get real close. Perhaps that's the real problem? jrg On Wed, Dec 30, 2009 at 1:13 PM, Nirnimesh nirnim...@google.com wrote: +chromium-dev. This is mac only. Trung, Mark: I did see ps zombies, but they were from my long-standing chrome, not from chromebot. I'm on 10.5.8. Kris: I'm pretty sure chromebot worked fine on the 249 branch, at least until 2 weeks ago. TVL: Time Machine is off and /Library/Preferences/com.apple.TimeMachine.plist is modest. I run chromebot with each chrome instance running with its own profile dir. John: output of launchctl bslist over time attached. It goes from 196 to a high of 244. Is this significant considering there are 20 instances of chrome running? This number does however go down when chromebot kills some chrome processes and then climbs back again when new instances are fired. I did notice one other interesting thing related to ps. As long as chromebot runs, there were a number of stuck (though not zombies) ps processes. Any additional ps commands in terminal gets stuck too. On Wed, Dec 30, 2009 at 10:37 AM, Viet-trung Luu v...@google.com wrote: There may be several (possibly related) problems. I'm definitely seeing zombie ps-es on 249.43 (the released beta). What Nirnimesh said about not being able to fork processes (until Chrome gets killed) could be caused by zombies. Is anyone else seeing zombies? - Trung On Wed, Dec 30, 2009 at 1:28 PM, Thomas Van Lenten thoma...@google.com wrote: On Wed, Dec 30, 2009 at 1:22 PM, Kris Rambish kr...@google.com wrote: A few questions: - Does this happen with the 249 (Beta) branch? - Do we know when this has gotten worse on TOT? + If we don't, does it make sense to fire up 15 minis with different builds and see where it breaks/gets worse? Look at /Library/Preferences/com.apple.TimeMachine.plist, how big is it? When you run chromebot, what do you use for profile dirs? TVL Kris On Wed, Dec 30, 2009 at 12:33 AM, Viet-trung Luu v...@google.com wrote: We're (or at least I am) seeing ps zombies (which I thought had been resolved). See http://code.google.com/p/chromium/issues/detail?id=28547#c29 (I see it on 10.5.8 also). Anyone who has the cycles-- please feel free to investigate - Trung On Wed, Dec 30, 2009 at 3:08 AM, John Grabowski j...@google.com wrote: I agree; this sounds a lot like the Avi bug (worked around by trung on http://crbug.com/28547). Nirnimesh: you can test this by doing a launchctl bslist | wc -l and see if that number keeps rising to infinity (at which point the system chokes). jrg On Tue, Dec 29, 2009 at 9:01 PM, Thomas Van Lenten thoma...@google.com wrote: I think there is a fix/hack on trunk for the issue Avi had brought up. Are they really stuck, or are they just really slow to respond? This could be the issue we are seeing with browser_tests and some of the perf tests where things are getting slower with time. What really confuses me here, is the reference build doesn't show this problem, and it's been updated to be a build from within the range that started showing this slowdown with time... TVL On Tue, Dec 29, 2009 at 11:52 PM, Nirnimesh nirnim...@google.com wrote: Here is what happens when running chromebot on builds from the trunk for some time now: Chromebot fires off 20 instances of Chrome. It starts off fine but within a few minutes you cannot do anything on the machine and chrome too appears to be hung (until it gets killed by chromebot). Until then you cannot fork any processes, cannot launch any app, nothing. Is this similar to the other bug in beta about running out of resources (which avi brought up), or does this sound different? Thanks -- ../NiR -- ../NiR -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] Re: Running Chromebot on official builds from trunk
On Wed, Dec 30, 2009 at 10:42 PM, John Grabowski j...@chromium.org wrote: Your log seems fine so it's probably not the Avi/Trung/breakpad bug. I'm not aware of any case where Chrome launches 'ps' other than about:memory (Mac only). Some quick tests with about:memory on 249.30 do not create ps zombies. I could not see any obvious path in base/process_util_posix.cc's GetAppOutputInternal() which would not hit the wait() call. Nirnimesh, are you using about:memory? Do you know of a trigger for 'ps' process creation or zombification? The default max procs per userid on 10.5.8 is 266. With ~20 Chromes and 10 tabs each you're starting to get real close. Perhaps that's the real problem? Heck, if you add in plugin processes (flash), and you've probably hit it. Maybe as plugin support got better, that's what caused you to run into it? TVL jrg On Wed, Dec 30, 2009 at 1:13 PM, Nirnimesh nirnim...@google.com wrote: +chromium-dev. This is mac only. Trung, Mark: I did see ps zombies, but they were from my long-standing chrome, not from chromebot. I'm on 10.5.8. Kris: I'm pretty sure chromebot worked fine on the 249 branch, at least until 2 weeks ago. TVL: Time Machine is off and /Library/Preferences/com.apple.TimeMachine.plist is modest. I run chromebot with each chrome instance running with its own profile dir. John: output of launchctl bslist over time attached. It goes from 196 to a high of 244. Is this significant considering there are 20 instances of chrome running? This number does however go down when chromebot kills some chrome processes and then climbs back again when new instances are fired. I did notice one other interesting thing related to ps. As long as chromebot runs, there were a number of stuck (though not zombies) ps processes. Any additional ps commands in terminal gets stuck too. On Wed, Dec 30, 2009 at 10:37 AM, Viet-trung Luu v...@google.com wrote: There may be several (possibly related) problems. I'm definitely seeing zombie ps-es on 249.43 (the released beta). What Nirnimesh said about not being able to fork processes (until Chrome gets killed) could be caused by zombies. Is anyone else seeing zombies? - Trung On Wed, Dec 30, 2009 at 1:28 PM, Thomas Van Lenten thoma...@google.com wrote: On Wed, Dec 30, 2009 at 1:22 PM, Kris Rambish kr...@google.com wrote: A few questions: - Does this happen with the 249 (Beta) branch? - Do we know when this has gotten worse on TOT? + If we don't, does it make sense to fire up 15 minis with different builds and see where it breaks/gets worse? Look at /Library/Preferences/com.apple.TimeMachine.plist, how big is it? When you run chromebot, what do you use for profile dirs? TVL Kris On Wed, Dec 30, 2009 at 12:33 AM, Viet-trung Luu v...@google.com wrote: We're (or at least I am) seeing ps zombies (which I thought had been resolved). See http://code.google.com/p/chromium/issues/detail?id=28547#c29 (I see it on 10.5.8 also). Anyone who has the cycles-- please feel free to investigate - Trung On Wed, Dec 30, 2009 at 3:08 AM, John Grabowski j...@google.com wrote: I agree; this sounds a lot like the Avi bug (worked around by trung on http://crbug.com/28547). Nirnimesh: you can test this by doing a launchctl bslist | wc -l and see if that number keeps rising to infinity (at which point the system chokes). jrg On Tue, Dec 29, 2009 at 9:01 PM, Thomas Van Lenten thoma...@google.com wrote: I think there is a fix/hack on trunk for the issue Avi had brought up. Are they really stuck, or are they just really slow to respond? This could be the issue we are seeing with browser_tests and some of the perf tests where things are getting slower with time. What really confuses me here, is the reference build doesn't show this problem, and it's been updated to be a build from within the range that started showing this slowdown with time... TVL On Tue, Dec 29, 2009 at 11:52 PM, Nirnimesh nirnim...@google.com wrote: Here is what happens when running chromebot on builds from the trunk for some time now: Chromebot fires off 20 instances of Chrome. It starts off fine but within a few minutes you cannot do anything on the machine and chrome too appears to be hung (until it gets killed by chromebot). Until then you cannot fork any processes, cannot launch any app, nothing. Is this similar to the other bug in beta about running out of resources (which avi brought up), or does this sound different? Thanks -- ../NiR -- ../NiR -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] Re: Running Chromebot on official builds from trunk
Debugging tips: 1. What's the parent pid of your zombies? Is it a browser process or something else? 2. Temporarily move /bin/ps to /bin/ps.real, and at /bin/ps put a small script that writes the full argument list and maybe other debuggery to a log file somewhere, and then invokes /bin/ps.real. For example: #!/bin/sh (date ; echo ${$} ${PPID} $...@}) /tmp/pslog.${EUID} exec ${0}.real $...@} Then, using the logged data, you'll see what arguments ps is being invoked with, and you'll be able to look to see where we make those ps calls in our own code (if it's even happening in our own code). 3. Try to reproduce it in a developer build instead of a released official build. You'll need to reproduce it to know if you've fixed it once you think you've figured it out. h-n-y, Mark -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] Re: Running Chromebot on official builds from trunk
On Wed, Dec 30, 2009 at 7:42 PM, John Grabowski j...@chromium.org wrote: Your log seems fine so it's probably not the Avi/Trung/breakpad bug. I'm not aware of any case where Chrome launches 'ps' other than about:memory (Mac only). Some quick tests with about:memory on 249.30 do not create ps zombies. I could not see any obvious path in base/process_util_posix.cc's GetAppOutputInternal() which would not hit the wait() call. Nirnimesh, are you using about:memory? No Do you know of a trigger for 'ps' process creation or zombification? I don't know what causes 'ps' processes to launch but I've seen them all the time since like forever while launching Chrome. The default max procs per userid on 10.5.8 is 266. With ~20 Chromes and 10 tabs each you're starting to get real close. Perhaps that's the real problem? That's a fair reasoning but then it should have been so all along. And I reach the same state even with 10 chrome instances, only it takes a few mins longer. However, setting ulimit -u 512 does make it better, and I don't run out of resources, so there it is. I'm still trumped as to how it was fine until 2 weeks ago. And does it mean that bad things will happen for a user with 20+ tabs? jrg On Wed, Dec 30, 2009 at 1:13 PM, Nirnimesh nirnim...@google.com wrote: +chromium-dev. This is mac only. Trung, Mark: I did see ps zombies, but they were from my long-standing chrome, not from chromebot. I'm on 10.5.8. Kris: I'm pretty sure chromebot worked fine on the 249 branch, at least until 2 weeks ago. TVL: Time Machine is off and /Library/Preferences/com.apple.TimeMachine.plist is modest. I run chromebot with each chrome instance running with its own profile dir. John: output of launchctl bslist over time attached. It goes from 196 to a high of 244. Is this significant considering there are 20 instances of chrome running? This number does however go down when chromebot kills some chrome processes and then climbs back again when new instances are fired. I did notice one other interesting thing related to ps. As long as chromebot runs, there were a number of stuck (though not zombies) ps processes. Any additional ps commands in terminal gets stuck too. On Wed, Dec 30, 2009 at 10:37 AM, Viet-trung Luu v...@google.com wrote: There may be several (possibly related) problems. I'm definitely seeing zombie ps-es on 249.43 (the released beta). What Nirnimesh said about not being able to fork processes (until Chrome gets killed) could be caused by zombies. Is anyone else seeing zombies? - Trung On Wed, Dec 30, 2009 at 1:28 PM, Thomas Van Lenten thoma...@google.com wrote: On Wed, Dec 30, 2009 at 1:22 PM, Kris Rambish kr...@google.com wrote: A few questions: - Does this happen with the 249 (Beta) branch? - Do we know when this has gotten worse on TOT? + If we don't, does it make sense to fire up 15 minis with different builds and see where it breaks/gets worse? Look at /Library/Preferences/com.apple.TimeMachine.plist, how big is it? When you run chromebot, what do you use for profile dirs? TVL Kris On Wed, Dec 30, 2009 at 12:33 AM, Viet-trung Luu v...@google.com wrote: We're (or at least I am) seeing ps zombies (which I thought had been resolved). See http://code.google.com/p/chromium/issues/detail?id=28547#c29 (I see it on 10.5.8 also). Anyone who has the cycles-- please feel free to investigate - Trung On Wed, Dec 30, 2009 at 3:08 AM, John Grabowski j...@google.com wrote: I agree; this sounds a lot like the Avi bug (worked around by trung on http://crbug.com/28547). Nirnimesh: you can test this by doing a launchctl bslist | wc -l and see if that number keeps rising to infinity (at which point the system chokes). jrg On Tue, Dec 29, 2009 at 9:01 PM, Thomas Van Lenten thoma...@google.com wrote: I think there is a fix/hack on trunk for the issue Avi had brought up. Are they really stuck, or are they just really slow to respond? This could be the issue we are seeing with browser_tests and some of the perf tests where things are getting slower with time. What really confuses me here, is the reference build doesn't show this problem, and it's been updated to be a build from within the range that started showing this slowdown with time... TVL On Tue, Dec 29, 2009 at 11:52 PM, Nirnimesh nirnim...@google.com wrote: Here is what happens when running chromebot on builds from the trunk for some time now: Chromebot fires off 20 instances of Chrome. It starts off fine but within a few minutes you cannot do anything on the machine and chrome too appears to be hung (until it gets killed by chromebot). Until then you cannot fork any processes, cannot launch any app, nothing. Is this similar to the other bug in beta about running out of resources (which avi brought
[chromium-dev] Re: Running Chromebot on official builds from trunk
Viet-Trung Luu wrote: 2. Temporarily move /bin/ps to /bin/ps.real, and at /bin/ps put a small script that writes the full argument list and maybe other debuggery to a log file somewhere, and then invokes /bin/ps.real. For example: [...] Done. And I now see that it looks like my fault. Now I need to figure out why ps is being run, and why it's not being properly reaped For those of us following along from home while on vacation, what did the arguments look like? It sounds like you're on track again. Excellent. Mark -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] Re: Running Chromebot on official builds from trunk
The default max procs per userid on 10.5.8 is 266. With ~20 Chromes and 10 tabs each you're starting to get real close. Perhaps that's the real problem? That's a fair reasoning but then it should have been so all along. And I reach the same state even with 10 chrome instances, only it takes a few mins longer. Perhaps other factors (e.g. pages load 1% faster) have pushed us over a threshold where this problem is noticed. Or perhaps tvl was right about plugins working better. Not sure it matters much given what you found below... However, setting ulimit -u 512 does make it better, and I don't run out of resources, so there it is. I'm still trumped as to how it was fine until 2 weeks ago. And does it mean that bad things will happen for a user with 20+ tabs? No; bad things will happen for a user with 20+ tabs only if they run 10 copies of Chrome at once. Chrome has reasonable renderer process limits so you won't hit this limit with Chrome and 500 tabs. jrg -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev