Re: [X10-users] CUDA:: Segmentation fault on samples

Dave Cunningham Mon, 07 Mar 2011 03:36:22 -0800

Curious as I was able to reproduce it :)

I can run GLX apps just fine though, it may be an unrelated bug.



On Sat, Mar 5, 2011 at 11:10 PM, Richard Gomes <rgomes1...@yahoo.co.uk>wrote:

> Hi David,
>
> The 'segmentation fault' I informed previously was actually my fault.
> I reinstalled Nvidia drivers and it now resolved.
>
> I discovered that something was wrong when I've tried to run Nvidia
> samples and it reported problems with GLX.
>
> Thanks a lot and I'm sorry for wasting your time.
>
> Richard Gomes
> http://www.jquantlib.org/index.php/User:RichardGomes
> twitter: frgomes
>
> JQuantLib is a library for Quantitative Finance written in Java.
> http://www.jquantlib.com/
> twitter: jquantlib
>
>
> On 01/03/11 20:13, Richard Gomes wrote:
> > Hi Dave,
> >
> > Thanks a lot for your help.
> > As I'm doing some very simple tests, I will try some old revision... for
> > the time being whilst you enjoy some days off :)
> >
> > Cheers :)
> >
> > Richard Gomes
> > http://www.jquantlib.org/index.php/User:RichardGomes
> > twitter: frgomes
> >
> > JQuantLib is a library for Quantitative Finance written in Java.
> > http://www.jquantlib.com/
> > twitter: jquantlib
> >
> >
> > On 28/02/11 07:52, Dave Cunningham wrote:
> >> I was able to reproduce this and got this trace from valgrind but I'm on
> >> vacation for a week so cannot debug further
> >>
> >> ==23823== Invalid read of size 1
> >> ==23823==    at 0x4027411: memcpy (mc_replace_strmem.c:497)
> >> ==23823==    by 0x80CD8F3: x10aux::deserialization_buffer::Read<unsigned
> >> int>::_(x10aux::deserialization_buffer&) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x80CDE74: unsigned int
> >> x10aux::deserialization_buffer::read<unsigned int>() (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x80CDE91: unsigned int
> >> x10aux::deserialization_buffer::peek<unsigned int>() (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x8155E16:
> >> x10aux::deserialization_buffer::Read<x10aux::ref<x10::io::SerialData>
> >>> ::_(x10aux::deserialization_buffer&) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x815636A: x10aux::ref<x10::io::SerialData>
> >> x10aux::deserialization_buffer::read<x10aux::ref<x10::io::SerialData>
> >()
> >> (in /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x440981F:
> >> x10::io::SerialData::_deserialize_body(x10aux::deserialization_buffer&)
> (in
> >> /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x440B031: x10aux::ref<x10::lang::Reference>
> >>
> x10::io::SerialData::_deserializer<x10::lang::Reference>(x10aux::deserialization_buffer&)
> >> (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x4A81A27:
> >>
> x10aux::DeserializationDispatcher::create_(x10aux::deserialization_buffer&,
> >> short) (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x4A7EDAF:
> >>
> x10aux::DeserializationDispatcher::create(x10aux::deserialization_buffer&,
> >> short) (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x4A850B6:
> >>
> x10aux::deserialization_buffer::deserialize_reference(x10aux::deserialization_buffer&)
> >> (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x815632E:
> >> x10aux::deserialization_buffer::Read<x10aux::ref<x10::io::SerialData>
> >>> ::_(x10aux::deserialization_buffer&) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==  Address 0x615777d is 1 bytes after a block of size 28 alloc'd
> >> ==23823==    at 0x4025BD3: malloc (vg_replace_malloc.c:236)
> >> ==23823==    by 0x4C8FFAC: unsigned char* safe_malloc<unsigned
> >> char>(unsigned int, unsigned int) (in
> >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so)
> >> ==23823==    by 0x4C92134: x10rt_cuda_send_put (in
> >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so)
> >> ==23823==    by 0x4C8D0B2: x10rt_lgl_send_put (in
> >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so)
> >> ==23823==    by 0x4C8BA15: x10rt_send_put (in
> >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so)
> >> ==23823==    by 0x4A7B65E: x10aux::send_put(int, short,
> >> x10aux::serialization_buffer&, void*, unsigned int) (in
> >> /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x4AB7212: x10::util::IMC_copyToBody(void*, void*, int,
> >> x10::lang::Place, bool, x10aux::ref<x10::lang::Reference>) (in
> >> /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so)
> >> ==23823==    by 0x815D8E3: void
> >>
> x10::util::IndexedMemoryChunk<void>::asyncCopy<float>(x10::util::IndexedMemoryChunk<float>,
> >> int, x10::util::RemoteIndexedMemoryChunk<float>, int, int) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x815D97A: void
> >>
> x10::util::CUDAUtilities::initCUDAArray<float>(x10::util::IndexedMemoryChunk<float>,
> >> x10::util::RemoteIndexedMemoryChunk<float>, int) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x815EEB0: x10aux::ref<x10::array::RemoteArray<float>
> >
> >> x10::util::CUDAUtilities::makeCUDAArray<float>(x10::lang::Place, int,
> >> x10::util::IndexedMemoryChunk<float>) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x815F442: x10aux::ref<x10::array::RemoteArray<float>
> >
> >> x10::util::CUDAUtilities::makeRemoteArray<float>(x10::lang::Place, int,
> >> x10aux::ref<x10::array::Array<float>   >) (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >> ==23823==    by 0x8161877: KMeansCUDA__closure__2::__apply() (in
> >> /home/spark/work/17/KMeansCUDA.sockets.dbg)
> >>
> >>
> >>
> >> On Sun, Feb 27, 2011 at 2:45 AM, Richard Gomes<rgomes1...@yahoo.co.uk
> >wrote:
> >>
> >>> Hi guys,
> >>>
> >>> I'm getting segmentation fault on all CUDA samples, except
> CUDATopology.
> >>> Are you observing the same kind of problem?
> >>> If not, which compilation options are you using? Any help is much
> >>> appreciated.
> >>>
> >>>
> >>> I added this method to CUDATopology:
> >>>
> >>>
> >>>       public static def cells(p:Place) : Int = {
> >>>           if (p.isCUDA()) {
> >>>               val remote = CUDAUtilities.makeRemoteArray[Int](p, 1, 0);
> >>>               finish async at (p) @CUDA @CUDADirectParams {
> >>>                   val blocks  = CUDAUtilities.autoBlocks();
> >>>                   val threads = CUDAUtilities.autoThreads();
> >>>                   finish for (block in 0..0) async {
> >>>                       clocked finish for (thread in 0..0) clocked async
> {
> >>>                           remote(0) = blocks * threads;
> >>>                       }
> >>>                   }
> >>>               }
> >>>               val local = new Array[Int](1);
> >>>               finish Array.asyncCopy(remote, 0, local, 0, 1);
> >>>               return local(0);
> >>>           } else if (p.isSPE()) {
> >>>               return 1; // TODO: should return something else?
> >>>           } else {
> >>>               return 1; // TODO: should return the number of cores?
> >>>           }
> >>>       }
> >>>
> >>>
> >>> This is the compilation, using v2.1.2:
> >>>
> >>>
> >>> $ echo x10c++ ${X10C_OPTS} -report postcompile=5 CUDATopology.x10 -o
> >>> CUDATopology
> >>> x10c++ -NO_CHECKS -STATIC_CALLS -report postcompile=5 CUDATopology.x10
> >>> -o CUDATopology
> >>> $
> >>> $
> >>> $ x10c++ ${X10C_OPTS} -report postcompile=5 CUDATopology.x10 -o
> >>> CUDATopology
> >>>    Output files: [CUDATopology.h, CUDATopology.cu, CUDATopology.cc]
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_10
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_10.cubin CUDATopology.cu
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_11
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_11.cubin CUDATopology.cu
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_12
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_12.cubin CUDATopology.cu
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_13
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_13.cubin CUDATopology.cu
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_20
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_20.cubin CUDATopology.cu
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_21
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_21.cubin CUDATopology.cu
> >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_30
> >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o
> >>> CUDATopology_sm_30.cubin CUDATopology.cu
> >>> Executing post-compiler g++ -I/opt/JavaIDE/x10-2.1.2-linux_x86/include
> >>> -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -I/home/rgomes/tmp
> -I.
> >>> -Wno-long-long -Wno-unused-parameter -DNO_CHECKS -DX10_USE_BDWGC
> >>> -pthread -o /home/rgomes/tmp/CUDATopology CUDATopology.cc
> >>> xxx_main_xxx.cc -L/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/lib -lx10
> -lgc
> >>> -lm -lpthread -lrt -ldl -L/opt/JavaIDE/x10-2.1.2-linux_x86/lib
> >>> -lx10rt_sockets -Wl,--rpath
> >>> -Wl,/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/lib -Wl,--rpath
> >>> -Wl,/opt/JavaIDE/x10-2.1.2-linux_x86/lib -Wl,-export-dynamic
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_10'
> >>>        ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes
> cmem[0]
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_11'
> >>>        ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes
> cmem[0]
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_12'
> >>>        ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes
> cmem[0]
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_13'
> >>>        ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes
> cmem[0]
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_20'
> >>>        ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes
> cmem[2]
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_21'
> >>>        ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes
> cmem[2]
> >>> x10c++: ptxas info : Compiling entry function
> 'CUDATopology__closure__1'
> >>> for 'sm_30'
> >>>        ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes
> cmem[2]
> >>>
> >>>
> >>> Thanks a lot :)
> >>>
> >>> --
> >>> Richard Gomes
> >>> http://www.jquantlib.org/index.php/User:RichardGomes
> >>> twitter: frgomes
> >>>
> >>> JQuantLib is a library for Quantitative Finance written in Java.
> >>> http://www.jquantlib.com/
> >>> twitter: jquantlib
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Free Software Download: Index, Search&   Analyze Logs and other IT data
> in
> >>> Real-Time with Splunk. Collect, index and harness all the fast moving
> IT
> >>> data
> >>> generated by your applications, servers and devices whether physical,
> >>> virtual
> >>> or in the cloud. Deliver compliance at lower cost and gain new business
> >>> insights. http://p.sf.net/sfu/splunk-dev2dev
> >>> _______________________________________________
> >>> X10-users mailing list
> >>> X10-users@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/x10-users
> >>>
> >>
> ------------------------------------------------------------------------------
> >> Free Software Download: Index, Search&   Analyze Logs and other IT data
> in
> >> Real-Time with Splunk. Collect, index and harness all the fast moving IT
> data
> >> generated by your applications, servers and devices whether physical,
> virtual
> >> or in the cloud. Deliver compliance at lower cost and gain new business
> >> insights. http://p.sf.net/sfu/splunk-dev2dev
> >> _______________________________________________
> >> X10-users mailing list
> >> X10-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/x10-users
> >>
> >
> ------------------------------------------------------------------------------
> > Free Software Download: Index, Search&  Analyze Logs and other IT data in
> > Real-Time with Splunk. Collect, index and harness all the fast moving IT
> data
> > generated by your applications, servers and devices whether physical,
> virtual
> > or in the cloud. Deliver compliance at lower cost and gain new business
> > insights. http://p.sf.net/sfu/splunk-dev2dev
> > _______________________________________________
> > X10-users mailing list
> > X10-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/x10-users
> >
>
> ------------------------------------------------------------------------------
> What You Don't Know About Data Connectivity CAN Hurt You
> This paper provides an overview of data connectivity, details
> its effect on application quality, and explores various alternative
> solutions. http://p.sf.net/sfu/progress-d2d
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>
------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Re: [X10-users] CUDA:: Segmentation fault on samples

Reply via email to