Hi Victor, Thanks a lot! What should we do to get new version?
Regards, Alexander On 10.05.2011 02:02, Victor Minden wrote: > I believe I've resolved this issue. > > Cheers, > > Victor > --- > Victor L. Minden > > Tufts University > School of Engineering > Class of 2012 > > > On Sun, May 8, 2011 at 5:26 PM, Victor Minden <victorminden at gmail.com > <mailto:victorminden at gmail.com>> wrote: > > Barry, > > I can verify this on breadboard now, > > with two processes, cuda > > minden at bb45:~/petsc-dev/src/snes/examples/tutorials$ > /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra -machinefile > /home/balay/machinefile -n 2 ./ex47cu -da_grid_x 65535 -log_summary > -snes_monitor -ksp_monitor -da_vec_type cusp > 0 SNES Function norm 3.906279802209e-03 > 0 KSP Residual norm 5.994156809227e+00 > 1 KSP Residual norm 5.927247846249e-05 > 1 SNES Function norm 3.906225077938e-03 > 0 KSP Residual norm 5.993813868985e+00 > 1 KSP Residual norm 5.927575078206e-05 > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): invalid device pointer > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): invalid device pointer > Aborted (signal 6) > > > > Without cuda > > minden at bb45:~/petsc-dev/src/snes/examples/tutorials$ > /home/balay/soft/mvapich2-1.5-lucid/bin/mpiexec.hydra -machinefile > /home/balay/machinefile -n 2 ./ex47cu -da_grid_x 65535 -log_summary > -snes_monitor -ksp_monitor > 0 SNES Function norm 3.906279802209e-03 > 0 KSP Residual norm 5.994156809227e+00 > 1 KSP Residual norm 3.538158441448e-04 > 2 KSP Residual norm 3.124431921666e-04 > 3 KSP Residual norm 4.109213410989e-06 > 1 SNES Function norm 7.201017610490e-04 > 0 KSP Residual norm 3.317803708316e-02 > 1 KSP Residual norm 2.447380361169e-06 > 2 KSP Residual norm 2.164193969957e-06 > 3 KSP Residual norm 2.124317398679e-08 > 2 SNES Function norm 1.7196789348 <tel:1.7196789348>25e-05 > 0 KSP Residual norm 1.6515864531 <tel:1.6515864531>43e-06 > 1 KSP Residual norm 2.037037536868e-08 > 2 KSP Residual norm 1.109736798274e-08 > 3 KSP Residual norm 1.8572187721 <tel:1.8572187721>56e-12 > 3 SNES Function norm 1.159391068583e-09 > 0 KSP Residual norm 3.116936044619e-11 > 1 KSP Residual norm 1.366503312678e-12 > 2 KSP Residual norm 6.598477672192e-13 > 3 KSP Residual norm 5.306147277879e-17 > 4 SNES Function norm 2.202297235559e-10 > > Note the repeated norms when using cuda. Looks like I'll have to take > a closer look at this. > > -Victor > > --- > Victor L. Minden > > Tufts University > School of Engineering > Class of 2012 > > > > On Thu, May 5, 2011 at 2:57 PM, Barry Smith <bsmith at mcs.anl.gov > <mailto:bsmith at mcs.anl.gov>> wrote: > > > > Alexander > > > > Thank you for the sample code; it will be very useful. > > > > We have run parallel jobs with CUDA where each node has only > a single MPI process and uses a single GPU without the crash that > you get below. I cannot explain why it would not work in your > situation. Do you have access to two nodes each with a GPU so you > could try that? > > > > It is crashing in a delete of a > > > > struct _p_PetscCUSPIndices { > > CUSPINTARRAYCPU indicesCPU; > > CUSPINTARRAYGPU indicesGPU; > > }; > > > > where cusp::array1d<PetscInt,cusp::device_memory> > > > > thus it is crashing after it has completed actually doing the > computation. If you run with -snes_monitor -ksp_monitor with and > without the -da_vec_type cusp on 2 processes what do you get for > output in the two cases? I want to see if it is running correctly > on two processes? > > > > Could the crash be due to memory corruption sometime doing the > computation? > > > > > > Barry > > > > > > > > > > > > On May 5, 2011, at 3:38 AM, Alexander Grayver wrote: > > > >> Hello! > >> > >> We work with petsc-dev branch and ex47cu.cu <http://ex47cu.cu> > example. Our platform is > >> Intel Quad processor and 8 identical Tesla GPUs. CUDA 3.2 > toolkit is > >> installed. > >> Ideally we would like to make petsc working in a multi-GPU way > within > >> just one node so that different GPUs could be attached to different > >> processes. > >> Since it's not possible within current PETSc implementation we > created a > >> preload library (see LD_PRELOAD for details) for CUBLAS function > >> cublasInit(). > >> When PETSc calls this function our library gets control and we > assign > >> GPUs according to rank within MPI communicator, then we call > original > >> cublasInit(). > >> This preload library is very simple, see petsc_mgpu.c attached. > >> This trick makes each process to have its own context and > ideally all > >> computations should be distributed over several GPUs. > >> > >> We managed to build petsc and example (see makefile attached) > and we > >> tested it as follows: > >> > >> [agraiver at tesla-cmc new]$ ./lapexp -da_grid_x 65535 -info > > cpu_1process.out > >> [agraiver at tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x > 65535 -info > > >> cpu_2processes.out > >> [agraiver at tesla-cmc new]$ ./lapexp -da_grid_x 65535 > -da_vec_type cusp > >> -info > gpu_1process.out > >> [agraiver at tesla-cmc new]$ mpirun -np 2 ./lapexp -da_grid_x 65535 > >> -da_vec_type cusp -info > gpu_2processes.out > >> > >> Everything except last configuration works well. The last one > crashes > >> with the following exception and callstack: > >> terminate called after throwing an instance of > >> 'thrust::system::system_error' > >> what(): invalid device pointer > >> [tesla-cmc:15549] *** Process received signal *** > >> [tesla-cmc:15549] Signal: Aborted (6) > >> [tesla-cmc:15549] Signal code: (-6) > >> [tesla-cmc:15549] [ 0] /lib64/libpthread.so.0() [0x3de540eeb0] > >> [tesla-cmc:15549] [ 1] /lib64/libc.so.6(gsignal+0x35) > [0x3de50330c5] > >> [tesla-cmc:15549] [ 2] /lib64/libc.so.6(abort+0x186) [0x3de5034a76] > >> [tesla-cmc:15549] [ 3] > >> > > /opt/llvm/dragonegg/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d) > >> [0x7f0d3530b95d] > >> [tesla-cmc:15549] [ 4] > >> /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7b76) [0x7f0d35309b76] > >> [tesla-cmc:15549] [ 5] > >> /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7ba3) [0x7f0d35309ba3] > >> [tesla-cmc:15549] [ 6] > >> /opt/llvm/dragonegg/lib64/libstdc++.so.6(+0xb7cae) [0x7f0d35309cae] > >> [tesla-cmc:15549] [ 7] > >> > > ./lapexp(_ZN6thrust6detail6device4cuda4freeILj0EEEvNS_10device_ptrIvEE+0x69) > >> [0x426320] > >> [tesla-cmc:15549] [ 8] > >> > > ./lapexp(_ZN6thrust6detail6device8dispatch4freeILj0EEEvNS_10device_ptrIvEENS0_21cuda_device_space_tagE+0x2b) > >> [0x4258b2] > >> [tesla-cmc:15549] [ 9] > >> ./lapexp(_ZN6thrust11device_freeENS_10device_ptrIvEE+0x2f) > [0x424f78] > >> [tesla-cmc:15549] [10] > >> > > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust23device_malloc_allocatorIiE10deallocateENS_10device_ptrIiEEm+0x33) > >> [0x7f0d36aeacff] > >> [tesla-cmc:15549] [11] > >> > > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEE10deallocateEv+0x6e) > >> [0x7f0d36ae8e78] > >> [tesla-cmc:15549] [12] > >> > > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail18contiguous_storageIiNS_23device_malloc_allocatorIiEEED1Ev+0x19) > >> [0x7f0d36ae75f7] > >> [tesla-cmc:15549] [13] > >> > > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN6thrust6detail11vector_baseIiNS_23device_malloc_allocatorIiEEED1Ev+0x52) > >> [0x7f0d36ae65f4] > >> [tesla-cmc:15549] [14] > >> > > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN4cusp7array1dIiN6thrust6detail21cuda_device_space_tagEED1Ev+0x18) > >> [0x7f0d36ae5c2e] > >> [tesla-cmc:15549] [15] > >> > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(_ZN19_p_PetscCUSPIndicesD1Ev+0x1d) > [0x7f0d3751e45f] > >> [tesla-cmc:15549] [16] > >> > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(PetscCUSPIndicesDestroy+0x20f) > >> [0x7f0d3750c840] > >> [tesla-cmc:15549] [17] > >> > /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy_PtoP+0x1bc8) > >> [0x7f0d375af8af] > >> [tesla-cmc:15549] [18] > >> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(VecScatterDestroy+0x586) > >> [0x7f0d375e9ddf] > >> [tesla-cmc:15549] [19] > >> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy_MPIAIJ+0x49f) > >> [0x7f0d37191d24] > >> [tesla-cmc:15549] [20] > >> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(MatDestroy+0x546) > [0x7f0d370d54fe] > >> [tesla-cmc:15549] [21] > >> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESReset+0x5d1) > [0x7f0d3746fac3] > >> [tesla-cmc:15549] [22] > >> /opt/openmpi_gcc-1.4.3/lib/libpetsc.so(SNESDestroy+0x4b8) > [0x7f0d37470210] > >> [tesla-cmc:15549] [23] ./lapexp(main+0x5ed) [0x420745] > >> > >> I've sent all detailed output files for different execution > >> configuration listed above as well as configure.log and make.log to > >> petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov> hoping > that someone could recognize the problem. > >> Now we have one node with multi-GPU, but I'm also wondering if > someone > >> really tested usage of GPU functionality over several nodes > with one GPU > >> each? > >> > >> Regards, > >> Alexander > >> > >> <petsc_mgpu.c><makefile.txt><configure.log> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110511/6101c621/attachment.html>
