On Jan 20, 2012, at 4:58 PM, Stefano Zampini wrote:
> Thank you, I'll let you know if it crashes again. Anyways, the problem is
> that xin->map->n (vpscat.h actual line 58) is zero for some of my vectors,
> and thus it will enter the if block even if I don't need to do anything with
> CUSP. Is it really important the first logic of the OR?
The block is suppose handle the 0 case just fine, if it does not handle the
0 case then that is a bug either in PETSc or CUSP and needs to be fixed. Having
0 handled by the if is crucial to get any kind of performance otherwise it will
always copy the entire vector from the GPU to the CPU for absolutely no reason.
>
> Recompile will take a while since my petsc_arch on gpu cluster is not able to
> use cmake to build ( I saw missing files in CMakeLists.txt for CUSP and GPU
> related stuffs). Is it a known issue? Is there a way to simply recompile the
> changed code only?
I think this is because the cmake developers do not yet support the cuda
compiler nvcc. Bitch to them. cmake is the way in PETSc to get partial
recompiles.
Barry
>
> Stefano
>
>
> 2012/1/20 Barry Smith <bsmith at mcs.anl.gov>
>
> On Jan 20, 2012, at 2:32 PM, Jed Brown wrote:
>
> > On Fri, Jan 20, 2012 at 14:27, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > I do not understand the error traceback. It should NOT look like this. Is
> > that really the exact output from a single failed run? There snould not be
> > multiple messages of ----Error Message ---- etc. It shoul immediately after
> > the first listing of Configure options show the complete stack where the
> > problem happened instead it printed an initial error message again and then
> > again and then finally a stack. This is not suppose to be possible.
> >
> > That's the kind of thing that happens if the error is raised on COMM_SELF.
>
> ???? I don't think so. Note the entire error set comes from process 17,
> even with COMM_SELF it is not suppose to print the error message stuff
> multiple times on the same MPI node.
>
> > Also, is this really supposed to use CHKERRCUSP()?
>
> No, that is wrong, I fixed it but then had a nasty merge with Paul's
> updates to PETSc GPU stuff. I don't think that caused the grief.
>
> Stefano,
>
> Anyways since Paul updated all the cusp stuff please hg pull; hg update
> and rebuild the PETSc library then try again if still problems again send the
> entire output on error.
>
> If similar thing happens I'm tempted to ask you to run node 17 in the
> debugger and see why the error message comes up multiple times.
> -start_in_debugger -debugger_nodes 17
>
>
> Barry
>
>
> > The function uses normal CHKERRQ() inside.
> >
> > PetscErrorCode VecCUSPCopyFromGPUSome_Public(Vec v, PetscCUSPIndices ci)
> > {
> > PetscErrorCode ierr;
> >
> > PetscFunctionBegin;
> > ierr =
> > VecCUSPCopyFromGPUSome(v,&ci->indicesCPU,&ci->indicesGPU);CHKERRCUSP(ierr);
> > PetscFunctionReturn(0);
> > }
> >
> >
> > Are you running with multiple threads AND gpus? That won't work.
> >
> > Anyways I cannot find anywhere a list of Cusp error messages that include
> > the numbers 46 and 76; why are not the except messages strings ???
> >
> >
> > Barry
> >
> >
> > [17]PETSC ERROR: VecCUSPAllocateCheck() line 77 in
> > src/vec/vec/impls/seq/seqcusp//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/impls/seq/seqcusp/cuspvecimpl.h
> > [17]PETSC ERROR: --------------------- Error Message
> > ------------------------------------
> > [17]PETSC ERROR: Error in external library!
> > [17]PETSC ERROR: CUSP error 46!
> > [17]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [17]PETSC ERROR: Petsc Development HG revision: HG Date:
> > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [17]PETSC ERROR: See docs/index.html for manual pages.
> > [17]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by zampini Fri
> > Jan 20 19:01:30 2012
> > [17]PETSC ERROR: Libraries linked from
> > /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
> > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
> > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
> > --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
> > --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
> > --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
> > --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
> > --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
> > --with-log=1 --with-info=1
> > --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
> > --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
> > --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
> > PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
> > PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
> > --with-c++-support=1 --with-large-file-io=1
> > --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
> >
> > --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
> > --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
> > --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
> > --download-metis=1 --download-parmetis=1 --download-chaco=1
> > --download-scotch=1 --download-party=1
> > --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
> >
> > --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
> > [17]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [17]PETSC ERROR: VecCUSPCopyFromGPUSome() line 228 in
> > src/vec/vec/impls/seq/seqcusp/veccusp.cu
> > [17]PETSC ERROR: --------------------- Error Message
> > ------------------------------------
> > [17]PETSC ERROR: Error in external library!
> > [17]PETSC ERROR: CUSP error 76!
> > [17]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [17]PETSC ERROR: Petsc Development HG revision: HG Date:
> > [17]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [17]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [17]PETSC ERROR: See docs/index.html for manual pages.
> > [17]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [17]PETSC ERROR: ./bidomonotest on a gnu-4.4.3 named ella011 by zampini Fri
> > Jan 20 19:01:30 2012
> > [17]PETSC ERROR: Libraries linked from
> > /work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/gnu-4.4.3-debug-double-louis/lib
> > [17]PETSC ERROR: Configure run at Fri Jan 20 15:29:21 2012
> > [17]PETSC ERROR: Configure options --CUDAFLAGS=-m64
> > --with-cuda-dir=/caspur/local/apps/cuda/4.0 --with-cuda-arch=sm_20
> > --with-cusp-dir=/caspur/shared/gpu-cluster/devel/cusp/0.2/..
> > --with-thrust-dir=/caspur/local/apps/cuda/4.0/include
> > --with-boost-dir=/caspur/shared/sw/devel/boost/1.44.0/intel/11.1.064
> > --with-pcbddc=1 --with-make-np=12 --with-debugging=1 --with-errorchecking=1
> > --with-log=1 --with-info=1
> > --with-cmake=/work/adz/zampini/cmake/2.8.7/bin/cmake --with-gnu-compilers=1
> > --with-pthread=1 --with-pthreadclasses=1 --with-precision=double
> > --with-mpi-dir=/caspur/shared/sw/devel/openmpi/1.4.1/gnu/4.4.3
> > PETSC_DIR=/work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev
> > PETSC_ARCH=gnu-4.4.3-debug-double-louis --with-shared-libraries=1
> > --with-c++-support=1 --with-large-file-io=1
> > --download-hypre=/work/adz/zampini/PetscPlusExternalPackages/hypre-2.7.0b.tar.gz
> >
> > --download-umfpack=/work/adz/zampini/PetscPlusExternalPackages/UMFPACK-5.5.1.tar.gz
> > --download-ml=/work/adz/zampini/PetscPlusExternalPackages/ml-6.2.tar.gz
> > --download-spai=/work/adz/zampini/PetscPlusExternalPackages/spai_3.0.tar.gz
> > --download-metis=1 --download-parmetis=1 --download-chaco=1
> > --download-scotch=1 --download-party=1
> > --with-blas-lapack-include=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/include/acml.h
> >
> > --with-blas-lapack-lib=/caspur/shared/sw/devel/acml/4.4.0/gfortran64/lib/libacml.a
> > [17]PETSC ERROR:
> > ------------------------------------------------------------------------
> > [17]PETSC ERROR: VecCUSPCopyFromGPUSome_Public() line 263 in
> > src/vec/vec/impls/seq/seqcusp/veccusp.cu
> > [17]PETSC ERROR: VecScatterBegin_1() line 57 in
> > src/vec/vec/utils//work/adz/zampini/MyWorkingCopyOfPetsc/petsc-dev/include/../src/vec/vec/utils/vpscat.h
> > [17]PETSC ERROR: VecScatterBegin() line 1574 in src/vec/vec/utils/vscat.c
> > [17]PETSC ERROR: PCISSetUp() line 46 in src/ksp/pc/impls/is/pcis.c
> > [17]PETSC ERROR: PCSetUp_BDDC() line 230 in src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
> > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
> > [17]PETSC ERROR: PCBDDCSetupCoarseEnvironment() line 2081 in
> > src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCBDDCCoarseSetUp() line 1341 in
> > src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCSetUp_BDDC() line 255 in src/ksp/pc/impls/bddc/bddc.c
> > [17]PETSC ERROR: PCSetUp() line 832 in src/ksp/pc/interface/precon.c
> > [17]PETSC ERROR: KSPSetUp() line 261 in src/ksp/ksp/interface/itfunc.c
> >
> >
> > On Jan 20, 2012, at 12:20 PM, Stefano Zampini wrote:
> >
> > > Hi recently installed petsc-dev on a GPU cluster. I got an error in
> > > external library CUSP when calling PCISSetup: more precisely, doing
> > > VecScatterBegin on SEQ (not SEQCUSP!) vectors (please see the traceback
> > > attached). I'm developing the BDDC preconditioner code inside PETSc and
> > > this error occurred when doing multilevel: in such case some procs (like
> > > proc 17 in the case attached) has local dimension (relevant to PCIS)
> > > equal to zero.
> > >
> > > Thus, I think the real problem stays on line 41 of
> > > src/vec/vec/utils/vpscat.h. If you tell me the reason why you used the
> > > first condition on the if clause I can patch the problem.
> > >
> > > Regards,
> > > --
> > > Stefano
> > > <traceback>
> >
> >
>
>
>
>
> --
> Stefano