Hi David, The 'segmentation fault' I informed previously was actually my fault. I reinstalled Nvidia drivers and it now resolved.
I discovered that something was wrong when I've tried to run Nvidia samples and it reported problems with GLX. Thanks a lot and I'm sorry for wasting your time. Richard Gomes http://www.jquantlib.org/index.php/User:RichardGomes twitter: frgomes JQuantLib is a library for Quantitative Finance written in Java. http://www.jquantlib.com/ twitter: jquantlib On 01/03/11 20:13, Richard Gomes wrote: > Hi Dave, > > Thanks a lot for your help. > As I'm doing some very simple tests, I will try some old revision... for > the time being whilst you enjoy some days off :) > > Cheers :) > > Richard Gomes > http://www.jquantlib.org/index.php/User:RichardGomes > twitter: frgomes > > JQuantLib is a library for Quantitative Finance written in Java. > http://www.jquantlib.com/ > twitter: jquantlib > > > On 28/02/11 07:52, Dave Cunningham wrote: >> I was able to reproduce this and got this trace from valgrind but I'm on >> vacation for a week so cannot debug further >> >> ==23823== Invalid read of size 1 >> ==23823== at 0x4027411: memcpy (mc_replace_strmem.c:497) >> ==23823== by 0x80CD8F3: x10aux::deserialization_buffer::Read<unsigned >> int>::_(x10aux::deserialization_buffer&) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x80CDE74: unsigned int >> x10aux::deserialization_buffer::read<unsigned int>() (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x80CDE91: unsigned int >> x10aux::deserialization_buffer::peek<unsigned int>() (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x8155E16: >> x10aux::deserialization_buffer::Read<x10aux::ref<x10::io::SerialData> >>> ::_(x10aux::deserialization_buffer&) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x815636A: x10aux::ref<x10::io::SerialData> >> x10aux::deserialization_buffer::read<x10aux::ref<x10::io::SerialData> >() >> (in /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x440981F: >> x10::io::SerialData::_deserialize_body(x10aux::deserialization_buffer&) (in >> /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x440B031: x10aux::ref<x10::lang::Reference> >> x10::io::SerialData::_deserializer<x10::lang::Reference>(x10aux::deserialization_buffer&) >> (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x4A81A27: >> x10aux::DeserializationDispatcher::create_(x10aux::deserialization_buffer&, >> short) (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x4A7EDAF: >> x10aux::DeserializationDispatcher::create(x10aux::deserialization_buffer&, >> short) (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x4A850B6: >> x10aux::deserialization_buffer::deserialize_reference(x10aux::deserialization_buffer&) >> (in /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x815632E: >> x10aux::deserialization_buffer::Read<x10aux::ref<x10::io::SerialData> >>> ::_(x10aux::deserialization_buffer&) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== Address 0x615777d is 1 bytes after a block of size 28 alloc'd >> ==23823== at 0x4025BD3: malloc (vg_replace_malloc.c:236) >> ==23823== by 0x4C8FFAC: unsigned char* safe_malloc<unsigned >> char>(unsigned int, unsigned int) (in >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so) >> ==23823== by 0x4C92134: x10rt_cuda_send_put (in >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so) >> ==23823== by 0x4C8D0B2: x10rt_lgl_send_put (in >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so) >> ==23823== by 0x4C8BA15: x10rt_send_put (in >> /home/spark/x10-dbg/x10.dist/lib/libx10rt_sockets.so) >> ==23823== by 0x4A7B65E: x10aux::send_put(int, short, >> x10aux::serialization_buffer&, void*, unsigned int) (in >> /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x4AB7212: x10::util::IMC_copyToBody(void*, void*, int, >> x10::lang::Place, bool, x10aux::ref<x10::lang::Reference>) (in >> /home/spark/x10-dbg/x10.dist/stdlib/lib/libx10.so) >> ==23823== by 0x815D8E3: void >> x10::util::IndexedMemoryChunk<void>::asyncCopy<float>(x10::util::IndexedMemoryChunk<float>, >> int, x10::util::RemoteIndexedMemoryChunk<float>, int, int) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x815D97A: void >> x10::util::CUDAUtilities::initCUDAArray<float>(x10::util::IndexedMemoryChunk<float>, >> x10::util::RemoteIndexedMemoryChunk<float>, int) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x815EEB0: x10aux::ref<x10::array::RemoteArray<float> > >> x10::util::CUDAUtilities::makeCUDAArray<float>(x10::lang::Place, int, >> x10::util::IndexedMemoryChunk<float>) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x815F442: x10aux::ref<x10::array::RemoteArray<float> > >> x10::util::CUDAUtilities::makeRemoteArray<float>(x10::lang::Place, int, >> x10aux::ref<x10::array::Array<float> >) (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> ==23823== by 0x8161877: KMeansCUDA__closure__2::__apply() (in >> /home/spark/work/17/KMeansCUDA.sockets.dbg) >> >> >> >> On Sun, Feb 27, 2011 at 2:45 AM, Richard Gomes<rgomes1...@yahoo.co.uk>wrote: >> >>> Hi guys, >>> >>> I'm getting segmentation fault on all CUDA samples, except CUDATopology. >>> Are you observing the same kind of problem? >>> If not, which compilation options are you using? Any help is much >>> appreciated. >>> >>> >>> I added this method to CUDATopology: >>> >>> >>> public static def cells(p:Place) : Int = { >>> if (p.isCUDA()) { >>> val remote = CUDAUtilities.makeRemoteArray[Int](p, 1, 0); >>> finish async at (p) @CUDA @CUDADirectParams { >>> val blocks = CUDAUtilities.autoBlocks(); >>> val threads = CUDAUtilities.autoThreads(); >>> finish for (block in 0..0) async { >>> clocked finish for (thread in 0..0) clocked async { >>> remote(0) = blocks * threads; >>> } >>> } >>> } >>> val local = new Array[Int](1); >>> finish Array.asyncCopy(remote, 0, local, 0, 1); >>> return local(0); >>> } else if (p.isSPE()) { >>> return 1; // TODO: should return something else? >>> } else { >>> return 1; // TODO: should return the number of cores? >>> } >>> } >>> >>> >>> This is the compilation, using v2.1.2: >>> >>> >>> $ echo x10c++ ${X10C_OPTS} -report postcompile=5 CUDATopology.x10 -o >>> CUDATopology >>> x10c++ -NO_CHECKS -STATIC_CALLS -report postcompile=5 CUDATopology.x10 >>> -o CUDATopology >>> $ >>> $ >>> $ x10c++ ${X10C_OPTS} -report postcompile=5 CUDATopology.x10 -o >>> CUDATopology >>> Output files: [CUDATopology.h, CUDATopology.cu, CUDATopology.cc] >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_10 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_10.cubin CUDATopology.cu >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_11 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_11.cubin CUDATopology.cu >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_12 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_12.cubin CUDATopology.cu >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_13 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_13.cubin CUDATopology.cu >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_20 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_20.cubin CUDATopology.cu >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_21 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_21.cubin CUDATopology.cu >>> Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_30 >>> -Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o >>> CUDATopology_sm_30.cubin CUDATopology.cu >>> Executing post-compiler g++ -I/opt/JavaIDE/x10-2.1.2-linux_x86/include >>> -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -I/home/rgomes/tmp -I. >>> -Wno-long-long -Wno-unused-parameter -DNO_CHECKS -DX10_USE_BDWGC >>> -pthread -o /home/rgomes/tmp/CUDATopology CUDATopology.cc >>> xxx_main_xxx.cc -L/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/lib -lx10 -lgc >>> -lm -lpthread -lrt -ldl -L/opt/JavaIDE/x10-2.1.2-linux_x86/lib >>> -lx10rt_sockets -Wl,--rpath >>> -Wl,/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/lib -Wl,--rpath >>> -Wl,/opt/JavaIDE/x10-2.1.2-linux_x86/lib -Wl,-export-dynamic >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_10' >>> ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0] >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_11' >>> ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0] >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_12' >>> ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0] >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_13' >>> ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0] >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_20' >>> ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes cmem[2] >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_21' >>> ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes cmem[2] >>> x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' >>> for 'sm_30' >>> ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes cmem[2] >>> >>> >>> Thanks a lot :) >>> >>> -- >>> Richard Gomes >>> http://www.jquantlib.org/index.php/User:RichardGomes >>> twitter: frgomes >>> >>> JQuantLib is a library for Quantitative Finance written in Java. >>> http://www.jquantlib.com/ >>> twitter: jquantlib >>> >>> >>> ------------------------------------------------------------------------------ >>> Free Software Download: Index, Search& Analyze Logs and other IT data in >>> Real-Time with Splunk. Collect, index and harness all the fast moving IT >>> data >>> generated by your applications, servers and devices whether physical, >>> virtual >>> or in the cloud. Deliver compliance at lower cost and gain new business >>> insights. http://p.sf.net/sfu/splunk-dev2dev >>> _______________________________________________ >>> X10-users mailing list >>> X10-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/x10-users >>> >> ------------------------------------------------------------------------------ >> Free Software Download: Index, Search& Analyze Logs and other IT data in >> Real-Time with Splunk. Collect, index and harness all the fast moving IT data >> generated by your applications, servers and devices whether physical, virtual >> or in the cloud. Deliver compliance at lower cost and gain new business >> insights. http://p.sf.net/sfu/splunk-dev2dev >> _______________________________________________ >> X10-users mailing list >> X10-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/x10-users >> > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > X10-users mailing list > X10-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/x10-users > ------------------------------------------------------------------------------ What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d _______________________________________________ X10-users mailing list X10-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/x10-users