It finally finished running through cuda-gdb. Here's a backtrace. new_size=46912574500784 in the call to thrust::detail::vector_base<double, thrust::device_malloc_allocator<double> >::resize looks suspicious.
#0 0x0000003e1c832885 in raise () from /lib64/libc.so.6 #1 0x0000003e1c834065 in abort () from /lib64/libc.so.6 #2 0x0000003e284bea7d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #3 0x0000003e284bcc06 in ?? () from /usr/lib64/libstdc++.so.6 #4 0x0000003e284bcc33 in std::terminate() () from /usr/lib64/libstdc++.so.6 #5 0x0000003e284bcd2e in __cxa_throw () from /usr/lib64/libstdc++.so.6 #6 0x00002aaaab45ad71 in thrust::detail::backend::cuda::malloc<0u> (n=375300596006272) at malloc.inl:50 #7 0x00002aaaab454322 in thrust::detail::backend::dispatch::malloc<0u> (n=375300596006272) at malloc.h:56 #8 0x00002aaaab453555 in thrust::device_malloc (n=375300596006272) at device_malloc.inl:32 #9 0x00002aaaab46477d in thrust::device_malloc<double> (n=46912574500784) at device_malloc.inl:38 #10 0x00002aaaab461fce in thrust::device_malloc_allocator<double>::allocate ( this=0x7fffffff9880, cnt=46912574500784) at device_malloc_allocator.h:101 #11 0x00002aaaab45ee91 in thrust::detail::contiguous_storage<double, thrust::device_malloc_allocator<double> >::allocate (this=0x7fffffff9880, n=46912574500784) at contiguous_storage.inl:134 #12 0x00002aaaab46ebba in thrust::detail::contiguous_storage<double, thrust::device_malloc_allocator<double> >::contiguous_storage (this=0x7fffffff9880, n=46912574500784) at contiguous_storage.inl:46 #13 0x00002aaaab46cd1e in thrust::detail::vector_base<double, thrust::device_malloc_allocator<double> >::fill_insert (this=0x13623990, position=..., n=46912574500784, x=@0x7fffffff9f18) at vector_base.inl:792 #14 0x00002aaaab46b058 in thrust::detail::vector_base<double, thrust::device_malloc_allocator<double> >::insert (this=0x13623990, position=..., n=46912574500784, x=@0x7fffffff9f18) at vector_base.inl:561 #15 0x00002aaaab4692a3 in thrust::detail::vector_base<double, thrust::device_malloc_allocator<double> >::resize (this=0x13623990, new_size=46912574500784, x=@0x7fffffff9f18) at vector_base.inl:222 #16 0x00002aaaac2c3d9b in cusp::precond::smoothed_aggregation<int, double, thrust::detail::cuda_device_space_tag>::smoothed_aggregation<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag> > (this=0x136182b0, A=..., theta=0) at smoothed_aggregation.inl:210 #17 0x00002aaaac27cf84 in PCSetUp_SACUSP (pc=0x1360f330) at sacusp.cu:76 #18 0x00002aaaac1f0024 in PCSetUp (pc=0x1360f330) at precon.c:832 #19 0x00002aaaabd02144 in KSPSetUp (ksp=0x135d2a00) at itfunc.c:261 #20 0x00002aaaabd0396e in KSPSolve (ksp=0x135d2a00, b=0x135a0fa0, x=0x135a2b50) at itfunc.c:385 #21 0x0000000000403619 in main (argc=17, args=0x7fffffffc538) at ex2.c:217 On Mon, Feb 27, 2012 at 4:48 PM, John Fettig <john.fettig at gmail.com> wrote: > Hi Paul, > > This is very interesting. I tried building the code with > --download-txpetscgpu and it doesn't work for me. It runs out of memory, > no matter how small the problem (this is ex2 from > src/ksp/ksp/examples/tutorials): > > mpirun -np 1 ./ex2 -n 10 -m 10 -ksp_type cg -pc_type sacusp -mat_type > aijcusp -vec_type cusp -cusp_storage_format csr -use_cusparse 0 > > terminate called after throwing an instance of > 'thrust::system::detail::bad_alloc' > what(): std::bad_alloc: out of memory > MPI Application rank 0 killed before MPI_Finalize() with signal 6 > > This example works fine when I build without your gpu additions (and for > much larger problems too). Am I doing something wrong? > > For reference, I'm using CUDA 4.1, CUSP 0.3, and Thrust 1.5.1 > > John > > > On Fri, Feb 10, 2012 at 5:04 PM, Paul Mullowney <paulm at txcorp.com> wrote: > >> Hi All, >> >> I've been developing GPU capabilities for PETSc. The development has >> focused mostly on >> (1) An efficient multi-GPU SpMV, i.e. MatMult. This is working well. >> (2) Triangular Solve used in ILU preconditioners; i.e. MatSolve. The >> performance of this ... is what it is :| >> This code is in beta mode. Keep that in mind, if you decide to use it. It >> supports single and double precision, real numbers only! Complex will be >> supported at some point in the future, but not any time soon. >> >> To build with these capabilities, add the following to your configure >> line. >> --download-txpetscgpu=yes >> >> The capabilities of the SpMV code are accessed with the following 2 >> command line flags >> -cusp_storage_format csr (other options are coo (coordinate), ell >> (ellpack), dia (diagonal). hyb (hybrid) is not yet supported) >> -use_cusparse (this is a boolean and at the moment is only supported with >> csr format matrices. In the future, cusparse will work with ell, coo, and >> hyb formats). >> >> Regarding the number of GPUs to run on: >> Imagine a system with P nodes, N cores per node, and M GPUs per node. >> Then, to use only the GPUs, I would run with M ranks per node over P nodes. >> As an example, I have a system with 2 nodes. Each node has 8 cores, and 4 >> GPUs attached to each node (P=2, N=8, M=4). In a PBS queue script, one >> would use 2 nodes at 4 processors per node. Each mpi rank (CPU processor) >> will be attached to a GPU. >> >> You do not need to explicitly manage the GPUs, apart from understanding >> what type of system you are running on. To learn how many devices are >> available per node, use the command line flag: >> -cuda_show_devices >> >> -Paul >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/a02c4d2b/attachment.html>