Hi Brian

Thank you very much for the instant help!

I just tried "-mca btl openib,sm,self" and
"-mca mpi_leave_pinned 0" together (still with OpenMPI 1.3.1).

So far so good, it passed through two NB cases/linear system solutions,
it is running the third NB, and the memory use hasn't increased.
On the failed runs the second NB already used more memory than the first, and the third would blow up memory use.

If the run was bound do fail it would be swapping memory at this point, and it is not.
This is a good sign, I hope I am not speaking too early,
but it looks like your suggestion fixed the problem.
Thanks!

It was interesting to observe using Ganglia
that on the failed runs the memory use "jumps"
happened whenever HPL switched from one NB to another.
Every NB transition (i.e., time HPL started to solve a
new linear system, and probably generated a new random matrix)
the memory use would jump to a (significantly) higher value.
Anyway, this is just is in case the info tells you something about what
might be going on.

I will certainly follow your advice and upgrade to OpenMPI 1.3.2,
which I just downloaded.
You guys are prolific, a new edition per month! :)

Many thanks!
Gus Correa

Brian W. Barrett wrote:
Gus -

Open MPI 1.3.0 & 1.3.1 attempted to use some controls in the glibc malloc implementation to handle memory registration caching for InfiniBand. Unfortunately, it was not only bugging in that it didn't work, but it also has the side effect that certain memory usage patterns can cause the memory allocator to use much more memory than it normally would. The configuration options were set any time the openib module was loaded, even if it wasn't used in communication. Can you try running with the extra option:

  -mca mpi_leave_pinned 0

I'm guessing that will fix the problem. If you're using InfiniBand, you probably want to upgrade to 1.3.2, as there are known data corruption issues in 1.3.0 and 1.3.1 with openib.

Brian

On Fri, 1 May 2009, Gus Correa wrote:

Hi Ralph

Thank you very much for the prompt answer.
Sorry for being so confusing on my original message.

Yes, I am saying that the inclusion of openib is causing the difference
in behavior.
It runs with "sm,self", it fails with "openib,sm,self".
I am as puzzled as you are, because I thought the "openib" parameter
was simply ignored when running on a single node, exactly like you said.
After your message arrived, I ran HPL once more with "openib",
just in case.
Sure enough it failed just as I described.

And yes, all the procs run on a single node in both cases.
It doesn't seem to be a problem caused by a particular
node hardware either, as I already
tried three different nodes with similar results.

BTW, I successfully ran HPL across the whole cluster two days ago,
with IB ("openib,sm,self"),
but using a modest (for the cluster) problem size: N=50,000.
The total cluster memory is 24*16=384GB,
which gives a max HPL problem size N=195,000.
I have yet to try the large problem on the whole cluster,
but I am afraid I will stumble on the same memory problem.

Finally, on your email you use the syntax "btl=openib,sm,self",
with an "=" sign between the btl key and its values.
However, the mpiexec man page uses the syntax "btl openib,sm,self",
with a blank space between the btl key and its values.
I've been following the man page syntax.
The "=" sign doesn't seem to work, and aborts with the error:
"No executable was specified on the mpiexec command line.".
Could this possibly be the issue (say, wrong parsing of mca options)?

Many thanks!
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Ralph Castain wrote:
If you are running on a single node, then btl=openib,sm,self would be equivalent to btl=sm,self. OMPI is smart enough to know not to use IB if you are on a single node, and instead uses the shared memory subsystem.

Are you saying that the inclusion of openib is causing a difference in behavior, even though all procs are on the same node??

Just want to ensure I understand the problem.

Thanks
Ralph


On Fri, May 1, 2009 at 11:16 AM, Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>> wrote:

    Hi OpenMPI and HPC experts

    This may or may not be the right forum to post this,
    and I am sorry to bother those that think it is not.

    I am trying to run the HPL benchmark on our cluster,
    compiling it with Gnu and linking to
    GotoBLAS (1.26) and OpenMPI (1.3.1),
    both also Gnu-compiled.

    I have got failures that suggest a memory leak when the
    problem size is large, but still within the memory limits
    recommended by HPL.
    The problem only happens when "openib" is among the OpenMPI
    MCA parameters (and the problem size is large).
    Any help is appreciated.

    Here is a description of what happens.

    For starters I am trying HPL on a single node, to get a feeling for
    the right parameters (N & NB, P & Q, etc) on dual-socked quad-core
    AMD Opteron 2376 "Shanghai"

The HPL recommendation is to use close to 80% of your physical memory,
    to reach top Gigaflop performance.
    Our physical memory on a node is 16GB, and this gives a problem size
    N=40,000 to keep the 80% memory use.
    I tried several block sizes, somewhat correlated to the size of the
    processor cache:  NB=64 80 96 128 ...

    When I run HPL with N=20,000 or smaller all works fine,
    and the HPL run completes, regardless of whether "openib"
    is present or not on my MCA parameters.

    However, moving when I move N=40,000, or even N=35,000,
    the run starts OK with NB=64,
    but as NB is switched to larger values
    the total memory use increases in jumps (as shown by Ganglia),
    and becomes uneven across the processors (as shown by "top").
    The problem happens if "openib" is among the MCA parameters,
    but doesn't happen if I remove "openib" from the MCA list and use
    only "sm,self".

    For N=35,000, when NB reaches 96 memory use is already above the
    physical limit
    (16GB), having increased from 12.5GB to over 17GB.
    For N=40,000 the problem happens even earlier, with NB=80.
    At this point memory swapping kicks in,
    and eventually the run dies with memory allocation errors:

================================================================================
    T/V                N    NB     P     Q               Time  Gflops
--------------------------------------------------------------------------------
    WR01L2L4       35000   128     8     1             539.66 5.297e+01
--------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0043992
    ...... PASSED
    HPL ERROR from process # 0, on line 172 of function HPL_pdtest:
     >>> [7,0] Memory allocation failed for A, x and b. Skip. <<<
    ...

    ***

    The code snippet that corresponds to HPL_pdest.c is this,
    although the leak is probably somewhere else:

    /*
     * Allocate dynamic memory
     */
      vptr = (void*)malloc( ( (size_t)(ALGO->align) +
                              (size_t)(mat.ld+1) * (size_t)(mat.nq) ) *
                            sizeof(double) );
      info[0] = (vptr == NULL); info[1] = myrow; info[2] = mycol;
      (void) HPL_all_reduce( (void *)(info), 3, HPL_INT, HPL_max,
                             GRID->all_comm );
      if( info[0] != 0 )
      {
         if( ( myrow == 0 ) && ( mycol == 0 ) )
            HPL_pwarn( TEST->outfp, __LINE__, "HPL_pdtest",
                       "[%d,%d] %s", info[1], info[2],
"Memory allocation failed for A, x and b. Skip." );
         (TEST->kskip)++;
         return;
      }

    ***

    I found this continued increase in memory use rather strange,
    and suggestive of a memory leak in one of the codes being used.

    Everything (OpenMPI, GotoBLAS, and HPL)
    was compiled using Gnu only (gcc, gfortran, g++).

    I haven't changed anything on the compiler's memory model,
    i.e., I haven't used or changed the "-mcmodel" flag of gcc
(I don't know if the Makefiles on HPL, GotoBLAS, and OpenMPI use it.)

    No additional load is present on the node,
    other than the OS (Linux CentOS 5.2), HPL is running alone.

    The cluster has Infiniband.
    However, I am running on a single node.

    The surprising thing is that if I run on shared memory only
    (-mca btl sm,self) there is no memory problem,
    the memory use is stable at about 13.9GB,
    and the run completes.
    So, there is a way around to run on a single node.
(Actually shared memory is presumably the way to go on a single node.)

    However, if I introduce IB (-mca btl openib,sm,self)
    among the MCA btl parameters, then memory use blows up.

    This is bad news for me, because I want to extend the experiment
    to run HPL also across the whole cluster using IB,
    which is actually the ultimate goal of HPL, of course!
    It also suggests that the problem is somehow related to Infiniband,
    maybe hidden under OpenMPI.

    Here is the mpiexec command I use (with and without openib):

    /path/to/openmpi/bin/mpiexec \
           -prefix /the/run/directory \
           -np 8 \
           -mca btl [openib,]sm,self \
           xhpl


    Any help, insights, suggestions, reports of previous experiences,
    are much appreciated.

    Thank you,
    Gus Correa
    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users



------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to