On Wed, 2010-01-20 at 21:18 -0500, Peter Thompson wrote:
> Hi Jeff,
> 
> Sorry, speaking in shorthand again.
> 
> Jeff Squyres wrote:
> > On Jan 8, 2010, at 5:03 PM, Peter Thompson wrote:
> > 
> >> I've tried a few builds of 1.4 on Snow Leopard, and trying to start up 
> >> TotalView
> >> gets some of the more 'standard' problems.  
> > 
> > I don't quite know what you mean by "standard" problems...?
> 
> That's more or less 'standard problems' that I hear described when someone 
> tries 
> to build and MPI (not just OpenMPI) and things don't work on first try.  I 
> don't 
> know if you've worked on the interface directly, but you are probably aware 
> that 
> TotalView has an API where we set up a structure, MPIR_PROCTABLE, based on a 
> typedef MPIR_PROCDESC, which gets filled in as to what processes are started 
> up 
> on which nodes.  Which allows the debugger to attach to things automatically. 
> If the build is done so that the files that hold these structures are 
> optimized, 
> sometimes the typedef is optimized away.  Or in the case of other builds, the 
> file may have the correct optimization (none) but the symbol info is stripped 
> in 
> the link phase.  So it's a typical, or 'standard' issue I face, but hopefully 
> not for you.

I've seen several OpenMPI installs in the wild like this where the type
information for MPIR_PROCTABLE is missing.  The fact the type
information is missing however doesn't affect the code or contents of
memory at all, just that it's not described by debug information.  As
there is a standard (sort of) to describe MPIR_PROCTABLE what I choose
to do in padb is to use the standard to calculate the struct size and
offsets rather than the debug info.  This allows padb to work even when
the debug information is missing.

If the debug information is available that it matches what I expect it
to be.

Don't use the debug info but rather use fixed sizes and offsets:
http://code.google.com/p/padb/source/detail?r=355

Verify the type information if present:
http://code.google.com/p/padb/source/detail?r=386

> However, 
> some users prefer the classic launch with -tv, and this seems to be failing 
> with 
> the latest builds I've done on Darwin.

I've seen this 'problem' on Linux as well.  I'm unsure of the OpenMPI
version although I could ask the organisation concerned if required.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

Reply via email to