Hey,

> It would be nice if we could automatically detect this issue more often.

Indeed.

We have

 #define PetscMalloc(a,b)  ((a != 0) ? do_something : (*(b) = 0,0) )

For out typical internal use cases, 'a' is of type PetscInt, so we could check for (a < 0) and throw a better error message. On the other hand, assuming 'a' to be signed might result in tautology warnings for unsigned integers in user code, so it may be cleaner to hide this additional check in something like "PetscSignedMalloc()" and only use it for the typical internal routines where the argument to PetscMalloc() is a product of potentially large values. A quick grep on the source tree only shows a handful of cases where this is indeed the case (typically just matrices), allowing us to capture most of these problems. Is this a viable approach?

Best regards,
Karli



>
Begin forwarded message:

*From: *Mathis Friesdorf <[email protected]
<mailto:[email protected]>>
*Subject: **Re: [petsc-users] Unexpected "Out of memory error" with SLEPC*
*Date: *June 26, 2014 at 5:08:23 AM CDT
*To: *Karl Rupp <[email protected] <mailto:[email protected]>>
*Cc: *<[email protected] <mailto:[email protected]>>,
<[email protected] <mailto:[email protected]>>

Dear Karl,

thanks a lot! This indeed worked. I recompiled PETSC with

./configure --with-64-bit-indices=1
PETSC_ARCH = arch-linux2-c-debug-int64

and the error is gone. Again thanks for your help!

All the best, Mathis


On Wed, Jun 25, 2014 at 12:42 PM, Karl Rupp <[email protected]
<mailto:[email protected]>> wrote:

    Hi Mathis,

    this looks very much like an integer overflow:
    http://www.mcs.anl.gov/petsc/__documentation/faq.html#with-__64-bit-indices
    <http://www.mcs.anl.gov/petsc/documentation/faq.html#with-64-bit-indices>

    Best regards,
    Karli


    On 06/25/2014 12:31 PM, Mathis Friesdorf wrote:

        Dear all,

        after a very useful email exchange with Jed Brown quite a
        while ago, I
        was able to find the lowest eigenvalue of a large matrix which is
        constructed as a tensor product. Admittedly the solution is a bit
        hacked, but is based on a Matrix shell and Armadillo and therefore
        reasonably fast. The problem seems to work well for smaller
        systems, but
        once the vectors reach a certain size, I get "out of memory"
        errors. I
        have tested the initialization of a vector of that size and
        multiplication by the matrix. This works fine and takes
        roughly 20GB of
        memory. There are 256 GB available, so I see no reason why the esp
        solvers should complain. Does anyone have an idea what goes
        wrong here?
        The error message is not very helpful and claims that a memory is
        requested that is way beyond any reasonable number: /Memory
        requested
        18446744056529684480./


        Thanks and all the best, Mathis Friesdorf


        *Output of the Program:*
        /mathis@n180:~/localisation$ ./local_plus 27/
        /System Size: 27

        
------------------------------__------------------------------__--------------
        [[30558,1],0]: A high-performance Open MPI point-to-point
        messaging module
        was unable to find any relevant network interfaces:

        Module: OpenFabrics (openib)
           Host: n180

        Another transport will be used instead, although this may
        result in
        lower performance.
        
------------------------------__------------------------------__--------------
        [0]PETSC ERROR: --------------------- Error Message
        ------------------------------__------
        [0]PETSC ERROR: Out of memory. This could be due to allocating
        [0]PETSC ERROR: too large an object or bleeding by not properly
        [0]PETSC ERROR: destroying unneeded objects.
        [0]PETSC ERROR: Memory allocated 3221286704 Memory used by process
        3229827072
        [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log
        for info.
        [0]PETSC ERROR: Memory requested 18446744056529684480!
        [0]PETSC ERROR:
        
------------------------------__------------------------------__------------
        [0]PETSC ERROR: Petsc Release Version 3.4.4, Mar, 13, 2014
        [0]PETSC ERROR: See docs/changes/index.html for recent updates.
        [0]PETSC ERROR: See docs/faq.html for hints about trouble
        shooting.
        [0]PETSC ERROR: See docs/index.html for manual pages.
        [0]PETSC ERROR:
        
------------------------------__------------------------------__------------
        [0]PETSC ERROR: ./local_plus on a arch-linux2-cxx-debug named
        n180 by
        mathis Wed Jun 25 12:23:01 2014
        [0]PETSC ERROR: Libraries linked from /home/mathis/bin_nodebug/lib
        [0]PETSC ERROR: Configure run at Wed Jun 25 00:03:34 2014
        [0]PETSC ERROR: Configure options
        PETSC_DIR=/home/mathis/petsc-__3.4.4
        --with-debugging=1 COPTFLAGS="-O3 -march=p4 -mtune=p4"
        --with-fortran=0
        -with-mpi=1 --with-mpi-dir=/usr/lib/__openmpi --with-clanguage=cxx
        --prefix=/home/mathis/bin___nodebug
        [0]PETSC ERROR:
        
------------------------------__------------------------------__------------
        [0]PETSC ERROR: PetscMallocAlign() line 46 in
        /home/mathis/petsc-3.4.4/src/__sys/memory/mal.c
        [0]PETSC ERROR: PetscTrMallocDefault() line 189 in
        /home/mathis/petsc-3.4.4/src/__sys/memory/mtr.c
        [0]PETSC ERROR: VecDuplicateVecs_Contiguous() line 62 in
        src/vec/contiguous.c
        [0]PETSC ERROR: VecDuplicateVecs() line 589 in
        /home/mathis/petsc-3.4.4/src/__vec/vec/interface/vector.c
        [0]PETSC ERROR: EPSAllocateSolution() line 51 in
        src/eps/interface/mem.c
        [0]PETSC ERROR: EPSSetUp_KrylovSchur() line 141 in
        src/eps/impls/krylov/__krylovschur/krylovschur.c
        [0]PETSC ERROR: EPSSetUp() line 147 in src/eps/interface/setup.c
        [0]PETSC ERROR: EPSSolve() line 90 in src/eps/interface/solve.c
        [0]PETSC ERROR: main() line 48 in local_plus.cpp
        
------------------------------__------------------------------__--------------
        MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
        with errorcode 55.

        NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
        processes.
        You may or may not see output from other processes, depending on
        exactly when Open MPI kills them.
        
------------------------------__------------------------------__--------------

        /
        *Output of make:
        */mathis@n180:~/localisation$ make local_plus

        mpicxx -o local_plus.o -c -Wall -Wwrite-strings
        -Wno-strict-aliasing
        -Wno-unknown-pragmas -g   -fPIC
        -I/home/mathis/armadillo-4.__300.8/include -lblas -llapack
        -L/home/mathis/armadillo-4.__300.8 -O3 -larmadillo
        -fomit-frame-pointer
        -I/home/mathis/bin_nodebug/__include
        -I/home/mathis/bin_nodebug/__include
        -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/__openmpi
          -D__INSDIR__= -I/home/mathis/bin_nodebug
        -I/home/mathis/bin_nodebug//__include
        -I/home/mathis/bin_nodebug/__include
        local_plus.cpp
        local_plus.cpp:22:0: warning: "__FUNCT__" redefined [enabled
        by default]
        In file included from
        /home/mathis/bin_nodebug/__include/petscvec.h:10:0,
                          from local_plus.cpp:10:
        /home/mathis/bin_nodebug/__include/petscviewer.h:386:0: note:
        this is the
        location of the previous definition
        g++ -o local_plus local_plus.o
        -Wl,-rpath,/home/mathis/bin___nodebug//lib
        -L/home/mathis/bin_nodebug//__lib -lslepc
        -Wl,-rpath,/home/mathis/bin___nodebug/lib
        -L/home/mathis/bin_nodebug/lib
          -lpetsc -llapack -lblas -lpthread
        -Wl,-rpath,/usr/lib/openmpi/__lib
        -L/usr/lib/openmpi/lib
        -Wl,-rpath,/usr/lib/gcc/x86___64-linux-gnu/4.7
        -L/usr/lib/gcc/x86_64-linux-__gnu/4.7
        -Wl,-rpath,/usr/lib/x86_64-__linux-gnu
        -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-__gnu
        -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte
        -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
        /bin/rm -f local_plus.o
        /
        *Code:
        *///Author: Mathis Friesdorf
        //[email protected] <mailto:[email protected]>
        <mailto:MathisFriesdorf@gmail.__com
        <mailto:[email protected]>>


        static char help[] = "1D chain\n";

        #include <iostream>
        #include <typeinfo>
        #include <armadillo>
        #include <petscsys.h>
        #include <petscvec.h>
        #include <petscmat.h>
        #include <slepcsys.h>
        #include <slepceps.h>
        #include <math.h>
        #include <assert.h>

        PetscErrorCode BathMult(Mat H, Vec x,Vec y);
        PetscInt       L=30,d=2,dsys;
        PetscErrorCode ierr;
        arma::mat hint = "1.0 0 0 0.0; 0 -1.0 2.0 0; 0 2.0 -1.0 0; 0 0
        0 1.0;";

        #define __FUNCT__ "main"
        int main(int argc, char **argv)
        {
           Mat H;
           EPS eps;
           Vec xr,xi;
           PetscScalar kr,ki;
           PetscInt j, nconv;

           L = strtol(argv[1],NULL,10);
           dsys = pow(d,L);
           printf("%s","System Size: ");
           printf("%i",L);
           printf("%s","\n");
           SlepcInitialize(&argc,&argv,(__char*)0,help);


         MatCreateShell(PETSC_COMM___WORLD,dsys,dsys,dsys,dsys,__NULL,&H);
           MatShellSetOperation(H,MATOP___MULT,(void(*)())BathMult);
           ierr = MatGetVecs(H,NULL,&xr); CHKERRQ(ierr);
           ierr = MatGetVecs(H,NULL,&xi); CHKERRQ(ierr);

           ierr = EPSCreate(PETSC_COMM_WORLD, &eps); CHKERRQ(ierr);
           ierr = EPSSetOperators(eps, H, NULL); CHKERRQ(ierr);
           ierr = EPSSetProblemType(eps, EPS_HEP); CHKERRQ(ierr);
           ierr = EPSSetWhichEigenpairs(eps,EPS___SMALLEST_REAL);
        CHKERRQ(ierr);
           ierr = EPSSetFromOptions( eps ); CHKERRQ(ierr);
           ierr = EPSSolve(eps); CHKERRQ(ierr);
           ierr = EPSGetConverged(eps, &nconv); CHKERRQ(ierr);
           for (j=0; j<1; j++) {
             EPSGetEigenpair(eps, j, &kr, &ki, xr, xi);
             printf("%s","Lowest Eigenvalue: ");
             PetscPrintf(PETSC_COMM_WORLD,"__%9F",kr);
             PetscPrintf(PETSC_COMM_WORLD,"__\n");
           }
           EPSDestroy(&eps);

           ierr = SlepcFinalize();
           return 0;
        }
        #undef __FUNCT__

        #define __FUNCT__ "BathMult"
        PetscErrorCode BathMult(Mat H, Vec x, Vec y)
        {
           PetscInt l;
           uint slice;
           PetscScalar *arrayin,*arrayout;

           VecGetArray(x,&arrayin);
           VecGetArray(y,&arrayout);
           arma::cube A = arma::cube(arrayin,1,1,pow(d,__L),
               /*copy_aux_mem*/false,/*__strict*/true);
           arma::mat result = arma::mat(arrayout,pow(d,L),1,
               /*copy_aux_mem*/false,/*__strict*/true);
           for (l=0;l<L-1;l++){
             A.reshape(pow(d,L-2-l),pow(d,__2),pow(d,l));
             result.reshape(pow(d,L-l),pow(__d,l));
             for (slice=0;slice<A.n_slices;__slice++){
               result.col(slice) += vectorise(A.slice(slice)*hint)__;
             }
           }
           arrayin = A.memptr();
           ierr = VecRestoreArray(x,&arrayin); CHKERRQ(ierr);
           arrayout = result.memptr();
           ierr = VecRestoreArray(y,&arrayout); CHKERRQ(ierr);
           PetscFunctionReturn(0);
        }
        #undef __FUNCT__/*
        *





Reply via email to