Re: [chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-31 Thread Jeremy Moskovich
Driveby [possibly irrelevant] comment: I committed a patch to trunk after
the beta branch that changes the code path for reaping processes, you might
want to try to repro this on the beta branch and then try on trunk.

On Thu, Dec 31, 2009 at 7:33 AM, Mark Mentovai m...@chromium.org wrote:

 Debugging tips:

 1. What's the parent pid of your zombies?  Is it a browser process or
 something else?

 2. Temporarily move /bin/ps to /bin/ps.real, and at /bin/ps put a
 small script that writes the full argument list and maybe other
 debuggery to a log file somewhere, and then invokes /bin/ps.real.  For
 example:

 #!/bin/sh
 (date ; echo ${$} ${PPID} $...@})  /tmp/pslog.${EUID}
 exec ${0}.real $...@}

 Then, using the logged data, you'll see what arguments ps is being
 invoked with, and you'll be able to look to see where we make those ps
 calls in our own code (if it's even happening in our own code).

 3. Try to reproduce it in a developer build instead of a released
 official build.  You'll need to reproduce it to know if you've fixed
 it once you think you've figured it out.

 h-n-y,
 Mark

 --
 Chromium Developers mailing list: chromium-dev@googlegroups.com
 View archives, change email options, or unsubscribe:
http://groups.google.com/group/chromium-dev


-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev

[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread Mark Mentovai
Thomas Van Lenten wrote:
 Where are the profile dirs going on disk?  Time machine doesn't have to be
 on, we can add things to this file incase it is turned on.

(and again.  the multiple-address interface for mailing lists kind of sucks.)

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread John Grabowski
Your log seems fine so it's probably not the Avi/Trung/breakpad bug.

I'm not aware of any case where Chrome launches 'ps' other than about:memory
(Mac only).  Some quick tests with about:memory on 249.30 do not create ps
zombies.  I could not see any obvious path in base/process_util_posix.cc's
GetAppOutputInternal() which would not hit the wait() call.

Nirnimesh, are you using about:memory?  Do you know of a trigger for 'ps'
process creation or zombification?

The default max procs per userid on 10.5.8 is 266.  With ~20 Chromes and 10
tabs each you're starting to get real close.  Perhaps that's the real
problem?

jrg

On Wed, Dec 30, 2009 at 1:13 PM, Nirnimesh nirnim...@google.com wrote:

 +chromium-dev. This is mac only.

 Trung, Mark: I did see ps zombies, but they were from my long-standing
 chrome, not from chromebot. I'm on 10.5.8.
 Kris: I'm pretty sure chromebot worked fine on the 249 branch, at least
 until 2 weeks ago.
 TVL: Time Machine is off
 and /Library/Preferences/com.apple.TimeMachine.plist is modest. I run
 chromebot with each chrome instance running with its own profile dir.
 John: output of launchctl bslist over time attached. It goes from 196 to a
 high of 244. Is this significant considering there are 20 instances of
 chrome running? This number does however go down when chromebot kills some
 chrome processes and then climbs back again when new instances are fired.

 I did notice one other interesting thing related to ps. As long as
 chromebot runs, there were a number of stuck (though not zombies) ps
 processes. Any additional ps commands in terminal gets stuck too.


 On Wed, Dec 30, 2009 at 10:37 AM, Viet-trung Luu v...@google.com wrote:

 There may be several (possibly related) problems.

 I'm definitely seeing zombie ps-es on 249.43 (the released beta). What
 Nirnimesh said about not being able to fork processes (until Chrome
 gets killed) could be caused by zombies.

 Is anyone else seeing zombies?

 - Trung

 On Wed, Dec 30, 2009 at 1:28 PM, Thomas Van Lenten thoma...@google.com
 wrote:
 
 
  On Wed, Dec 30, 2009 at 1:22 PM, Kris Rambish kr...@google.com wrote:
 
  A few questions:
- Does this happen with the 249 (Beta) branch?
- Do we know when this has gotten worse on TOT?
  + If we don't, does it make sense to fire up 15 minis with
 different
  builds and see where it breaks/gets worse?
 
  Look at /Library/Preferences/com.apple.TimeMachine.plist, how big is it?
  When you run chromebot, what do you use for profile dirs?
  TVL
 
 
  Kris
 
  On Wed, Dec 30, 2009 at 12:33 AM, Viet-trung Luu v...@google.com
 wrote:
 
  We're (or at least I am) seeing ps zombies (which I thought had been
  resolved). See
  http://code.google.com/p/chromium/issues/detail?id=28547#c29
  (I see it on 10.5.8 also).
 
  Anyone who has the cycles-- please feel free to investigate
 
  - Trung
 
  On Wed, Dec 30, 2009 at 3:08 AM, John Grabowski j...@google.com
 wrote:
   I agree; this sounds a lot like the Avi bug (worked around by trung
 on
   http://crbug.com/28547).
   Nirnimesh: you can test this by doing a launchctl bslist | wc -l
 and
   see
   if that number keeps rising to infinity (at which point the system
   chokes).
   jrg
  
   On Tue, Dec 29, 2009 at 9:01 PM, Thomas Van Lenten
   thoma...@google.com
   wrote:
  
   I think there is a fix/hack on trunk for the issue Avi had brought
 up.
Are they really stuck, or are they just really slow to respond?
 This
   could
   be the issue we are seeing with browser_tests and some of the perf
   tests
   where things are getting slower with time.
   What really confuses me here, is the reference build doesn't show
 this
   problem, and it's been updated to be a build from within the range
   that
   started showing this slowdown with time...
   TVL
  
  
   On Tue, Dec 29, 2009 at 11:52 PM, Nirnimesh nirnim...@google.com
   wrote:
  
   Here is what happens when running chromebot on builds from the
 trunk
   for
   some time now:
   Chromebot fires off 20 instances of Chrome. It starts off fine but
   within
   a few minutes you cannot do anything on the machine and chrome too
   appears
   to be hung (until it gets killed by chromebot). Until then you
 cannot
   fork
   any processes, cannot launch any app, nothing. Is this similar to
 the
   other
   bug in beta about running out of resources (which avi brought up),
 or
   does
   this sound different?
   Thanks
  
   --
   ../NiR
  
  
  
 
 
 




 --
 ../NiR


-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev

[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread Thomas Van Lenten
On Wed, Dec 30, 2009 at 10:42 PM, John Grabowski j...@chromium.org wrote:

 Your log seems fine so it's probably not the Avi/Trung/breakpad bug.

 I'm not aware of any case where Chrome launches 'ps' other than
 about:memory (Mac only).  Some quick tests with about:memory on 249.30 do
 not create ps zombies.  I could not see any obvious path in
 base/process_util_posix.cc's GetAppOutputInternal() which would not hit the
 wait() call.

 Nirnimesh, are you using about:memory?  Do you know of a trigger for 'ps'
 process creation or zombification?

 The default max procs per userid on 10.5.8 is 266.  With ~20 Chromes and 10
 tabs each you're starting to get real close.  Perhaps that's the real
 problem?


Heck, if you add in plugin processes (flash), and you've probably hit it.
 Maybe as plugin support got better, that's what caused you to run into it?

TVL



 jrg


 On Wed, Dec 30, 2009 at 1:13 PM, Nirnimesh nirnim...@google.com wrote:

 +chromium-dev. This is mac only.

 Trung, Mark: I did see ps zombies, but they were from my long-standing
 chrome, not from chromebot. I'm on 10.5.8.
 Kris: I'm pretty sure chromebot worked fine on the 249 branch, at least
 until 2 weeks ago.
 TVL: Time Machine is off
 and /Library/Preferences/com.apple.TimeMachine.plist is modest. I run
 chromebot with each chrome instance running with its own profile dir.
 John: output of launchctl bslist over time attached. It goes from 196 to a
 high of 244. Is this significant considering there are 20 instances of
 chrome running? This number does however go down when chromebot kills some
 chrome processes and then climbs back again when new instances are fired.

 I did notice one other interesting thing related to ps. As long as
 chromebot runs, there were a number of stuck (though not zombies) ps
 processes. Any additional ps commands in terminal gets stuck too.


 On Wed, Dec 30, 2009 at 10:37 AM, Viet-trung Luu v...@google.com wrote:

 There may be several (possibly related) problems.

 I'm definitely seeing zombie ps-es on 249.43 (the released beta). What
 Nirnimesh said about not being able to fork processes (until Chrome
 gets killed) could be caused by zombies.

 Is anyone else seeing zombies?

 - Trung

 On Wed, Dec 30, 2009 at 1:28 PM, Thomas Van Lenten thoma...@google.com
 wrote:
 
 
  On Wed, Dec 30, 2009 at 1:22 PM, Kris Rambish kr...@google.com
 wrote:
 
  A few questions:
- Does this happen with the 249 (Beta) branch?
- Do we know when this has gotten worse on TOT?
  + If we don't, does it make sense to fire up 15 minis with
 different
  builds and see where it breaks/gets worse?
 
  Look at /Library/Preferences/com.apple.TimeMachine.plist, how big is
 it?
  When you run chromebot, what do you use for profile dirs?
  TVL
 
 
  Kris
 
  On Wed, Dec 30, 2009 at 12:33 AM, Viet-trung Luu v...@google.com
 wrote:
 
  We're (or at least I am) seeing ps zombies (which I thought had been
  resolved). See
  http://code.google.com/p/chromium/issues/detail?id=28547#c29
  (I see it on 10.5.8 also).
 
  Anyone who has the cycles-- please feel free to investigate
 
  - Trung
 
  On Wed, Dec 30, 2009 at 3:08 AM, John Grabowski j...@google.com
 wrote:
   I agree; this sounds a lot like the Avi bug (worked around by trung
 on
   http://crbug.com/28547).
   Nirnimesh: you can test this by doing a launchctl bslist | wc -l
 and
   see
   if that number keeps rising to infinity (at which point the system
   chokes).
   jrg
  
   On Tue, Dec 29, 2009 at 9:01 PM, Thomas Van Lenten
   thoma...@google.com
   wrote:
  
   I think there is a fix/hack on trunk for the issue Avi had brought
 up.
Are they really stuck, or are they just really slow to respond?
 This
   could
   be the issue we are seeing with browser_tests and some of the perf
   tests
   where things are getting slower with time.
   What really confuses me here, is the reference build doesn't show
 this
   problem, and it's been updated to be a build from within the range
   that
   started showing this slowdown with time...
   TVL
  
  
   On Tue, Dec 29, 2009 at 11:52 PM, Nirnimesh nirnim...@google.com
 
   wrote:
  
   Here is what happens when running chromebot on builds from the
 trunk
   for
   some time now:
   Chromebot fires off 20 instances of Chrome. It starts off fine
 but
   within
   a few minutes you cannot do anything on the machine and chrome
 too
   appears
   to be hung (until it gets killed by chromebot). Until then you
 cannot
   fork
   any processes, cannot launch any app, nothing. Is this similar to
 the
   other
   bug in beta about running out of resources (which avi brought
 up), or
   does
   this sound different?
   Thanks
  
   --
   ../NiR
  
  
  
 
 
 




 --
 ../NiR




-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev

[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread Mark Mentovai
Debugging tips:

1. What's the parent pid of your zombies?  Is it a browser process or
something else?

2. Temporarily move /bin/ps to /bin/ps.real, and at /bin/ps put a
small script that writes the full argument list and maybe other
debuggery to a log file somewhere, and then invokes /bin/ps.real.  For
example:

#!/bin/sh
(date ; echo ${$} ${PPID} $...@})  /tmp/pslog.${EUID}
exec ${0}.real $...@}

Then, using the logged data, you'll see what arguments ps is being
invoked with, and you'll be able to look to see where we make those ps
calls in our own code (if it's even happening in our own code).

3. Try to reproduce it in a developer build instead of a released
official build.  You'll need to reproduce it to know if you've fixed
it once you think you've figured it out.

h-n-y,
Mark

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread Nirnimesh
On Wed, Dec 30, 2009 at 7:42 PM, John Grabowski j...@chromium.org wrote:

 Your log seems fine so it's probably not the Avi/Trung/breakpad bug.

 I'm not aware of any case where Chrome launches 'ps' other than
 about:memory (Mac only).  Some quick tests with about:memory on 249.30 do
 not create ps zombies.  I could not see any obvious path in
 base/process_util_posix.cc's GetAppOutputInternal() which would not hit the
 wait() call.

 Nirnimesh, are you using about:memory?


No


  Do you know of a trigger for 'ps' process creation or zombification?


I don't know what causes 'ps' processes to launch but I've seen them all the
time since like forever while launching Chrome.



 The default max procs per userid on 10.5.8 is 266.  With ~20 Chromes and 10
 tabs each you're starting to get real close.  Perhaps that's the real
 problem?


That's a fair reasoning but then it should have been so all along. And I
reach the same state even with 10 chrome instances, only it takes a few mins
longer.
However, setting ulimit -u 512 does make it better, and I don't run out of
resources, so there it is.
I'm still trumped as to how it was fine until 2 weeks ago. And does it mean
that bad things will happen for a user with 20+ tabs?





 jrg


 On Wed, Dec 30, 2009 at 1:13 PM, Nirnimesh nirnim...@google.com wrote:

 +chromium-dev. This is mac only.

 Trung, Mark: I did see ps zombies, but they were from my long-standing
 chrome, not from chromebot. I'm on 10.5.8.
 Kris: I'm pretty sure chromebot worked fine on the 249 branch, at least
 until 2 weeks ago.
 TVL: Time Machine is off
 and /Library/Preferences/com.apple.TimeMachine.plist is modest. I run
 chromebot with each chrome instance running with its own profile dir.
 John: output of launchctl bslist over time attached. It goes from 196 to a
 high of 244. Is this significant considering there are 20 instances of
 chrome running? This number does however go down when chromebot kills some
 chrome processes and then climbs back again when new instances are fired.

 I did notice one other interesting thing related to ps. As long as
 chromebot runs, there were a number of stuck (though not zombies) ps
 processes. Any additional ps commands in terminal gets stuck too.


 On Wed, Dec 30, 2009 at 10:37 AM, Viet-trung Luu v...@google.com wrote:

 There may be several (possibly related) problems.

 I'm definitely seeing zombie ps-es on 249.43 (the released beta). What
 Nirnimesh said about not being able to fork processes (until Chrome
 gets killed) could be caused by zombies.

 Is anyone else seeing zombies?

 - Trung

 On Wed, Dec 30, 2009 at 1:28 PM, Thomas Van Lenten thoma...@google.com
 wrote:
 
 
  On Wed, Dec 30, 2009 at 1:22 PM, Kris Rambish kr...@google.com
 wrote:
 
  A few questions:
- Does this happen with the 249 (Beta) branch?
- Do we know when this has gotten worse on TOT?
  + If we don't, does it make sense to fire up 15 minis with
 different
  builds and see where it breaks/gets worse?
 
  Look at /Library/Preferences/com.apple.TimeMachine.plist, how big is
 it?
  When you run chromebot, what do you use for profile dirs?
  TVL
 
 
  Kris
 
  On Wed, Dec 30, 2009 at 12:33 AM, Viet-trung Luu v...@google.com
 wrote:
 
  We're (or at least I am) seeing ps zombies (which I thought had been
  resolved). See
  http://code.google.com/p/chromium/issues/detail?id=28547#c29
  (I see it on 10.5.8 also).
 
  Anyone who has the cycles-- please feel free to investigate
 
  - Trung
 
  On Wed, Dec 30, 2009 at 3:08 AM, John Grabowski j...@google.com
 wrote:
   I agree; this sounds a lot like the Avi bug (worked around by trung
 on
   http://crbug.com/28547).
   Nirnimesh: you can test this by doing a launchctl bslist | wc -l
 and
   see
   if that number keeps rising to infinity (at which point the system
   chokes).
   jrg
  
   On Tue, Dec 29, 2009 at 9:01 PM, Thomas Van Lenten
   thoma...@google.com
   wrote:
  
   I think there is a fix/hack on trunk for the issue Avi had brought
 up.
Are they really stuck, or are they just really slow to respond?
 This
   could
   be the issue we are seeing with browser_tests and some of the perf
   tests
   where things are getting slower with time.
   What really confuses me here, is the reference build doesn't show
 this
   problem, and it's been updated to be a build from within the range
   that
   started showing this slowdown with time...
   TVL
  
  
   On Tue, Dec 29, 2009 at 11:52 PM, Nirnimesh nirnim...@google.com
 
   wrote:
  
   Here is what happens when running chromebot on builds from the
 trunk
   for
   some time now:
   Chromebot fires off 20 instances of Chrome. It starts off fine
 but
   within
   a few minutes you cannot do anything on the machine and chrome
 too
   appears
   to be hung (until it gets killed by chromebot). Until then you
 cannot
   fork
   any processes, cannot launch any app, nothing. Is this similar to
 the
   other
   bug in beta about running out of resources (which avi brought

[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread Mark Mentovai
Viet-Trung Luu wrote:
 2. Temporarily move /bin/ps to /bin/ps.real, and at /bin/ps put a
 small script that writes the full argument list and maybe other
 debuggery to a log file somewhere, and then invokes /bin/ps.real.  For
 example:
[...]

 Done. And I now see that it looks like my fault. Now I need to figure out
 why ps is being run, and why it's not being properly reaped

For those of us following along from home while on vacation, what did
the arguments look like?

It sounds like you're on track again.  Excellent.

Mark

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


[chromium-dev] Re: Running Chromebot on official builds from trunk

2009-12-30 Thread John Grabowski


 The default max procs per userid on 10.5.8 is 266.  With ~20 Chromes and
 10 tabs each you're starting to get real close.  Perhaps that's the real
 problem?


 That's a fair reasoning but then it should have been so all along. And I
 reach the same state even with 10 chrome instances, only it takes a few mins
 longer.


Perhaps other factors (e.g. pages load 1% faster) have pushed us over a
threshold where this problem is noticed.  Or perhaps tvl was right about
plugins working better.  Not sure it matters much  given what you found
below...

However, setting ulimit -u 512 does make it better, and I don't run out of
 resources, so there it is.
 I'm still trumped as to how it was fine until 2 weeks ago. And does it mean
 that bad things will happen for a user with 20+ tabs?


No; bad things will happen for a user with 20+ tabs only if they run 10
copies of Chrome at once.

Chrome has reasonable renderer process limits so you won't hit this limit
with Chrome and 500 tabs.

jrg

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev