El 07/02/2012, a las 06:34, Derek Gaston escribi?:

> On Mon, Feb 6, 2012 at 10:27 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> 
> Are _all_ the processes making it here?
> 
> Sigh.  I knew someone was going to ask that ;-)
> 
> I'll have to write a short script to grab the stack trace from every one of 
> the 10,000 processes to see where they are and try to find any anomalies.   
> Anyone have a script (or pieces of one) to do this that they wouldn't mind 
> sharing?

Try with PADB: http://padb.pittman.org.uk/
Jose


> 
> I did spot check quite a few and they were all in the same spot.
> 
> Now here comes the weirdness: I left one of these processes attached in GDB 
> for quite a while (10+ minutes) after the whole job had been hung for over an 
> hour.  When I noticed that I had left it attached I detached GDB and.... the 
> job started right up!  That is: it moved on past this problem!  How is that 
> for some weirdness.  It might have just been coincidence... or maybe me 
> stalling that process for a bit by attaching GDB nudged some communication in 
> the right direction... I don't know.
> 
> I know that's not terribly scientific.  I'll have to wait until the next job 
> hangs before I can do more inspection, but when (not if) that happens I'll 
> post back with more info.
> 
> Derek

Reply via email to