Hi guys,

I'm getting segmentation fault on all CUDA samples, except CUDATopology.
Are you observing the same kind of problem?
If not, which compilation options are you using? Any help is much 
appreciated.


I added this method to CUDATopology:


     public static def cells(p:Place) : Int = {
         if (p.isCUDA()) {
             val remote = CUDAUtilities.makeRemoteArray[Int](p, 1, 0);
             finish async at (p) @CUDA @CUDADirectParams {
                 val blocks  = CUDAUtilities.autoBlocks();
                 val threads = CUDAUtilities.autoThreads();
                 finish for (block in 0..0) async {
                     clocked finish for (thread in 0..0) clocked async {
                         remote(0) = blocks * threads;
                     }
                 }
             }
             val local = new Array[Int](1);
             finish Array.asyncCopy(remote, 0, local, 0, 1);
             return local(0);
         } else if (p.isSPE()) {
             return 1; // TODO: should return something else?
         } else {
             return 1; // TODO: should return the number of cores?
         }
     }


This is the compilation, using v2.1.2:


$ echo x10c++ ${X10C_OPTS} -report postcompile=5 CUDATopology.x10 -o 
CUDATopology
x10c++ -NO_CHECKS -STATIC_CALLS -report postcompile=5 CUDATopology.x10 
-o CUDATopology
$
$
$ x10c++ ${X10C_OPTS} -report postcompile=5 CUDATopology.x10 -o CUDATopology
  Output files: [CUDATopology.h, CUDATopology.cu, CUDATopology.cc]
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_10 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_10.cubin CUDATopology.cu
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_11 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_11.cubin CUDATopology.cu
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_12 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_12.cubin CUDATopology.cu
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_13 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_13.cubin CUDATopology.cu
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_20 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_20.cubin CUDATopology.cu
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_21 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_21.cubin CUDATopology.cu
Executing post-compiler nvcc --cubin -Xptxas -v -arch=sm_30 
-Inull/include -I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -o 
CUDATopology_sm_30.cubin CUDATopology.cu
Executing post-compiler g++ -I/opt/JavaIDE/x10-2.1.2-linux_x86/include 
-I/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/include -I/home/rgomes/tmp -I. 
-Wno-long-long -Wno-unused-parameter -DNO_CHECKS -DX10_USE_BDWGC 
-pthread -o /home/rgomes/tmp/CUDATopology CUDATopology.cc 
xxx_main_xxx.cc -L/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/lib -lx10 -lgc 
-lm -lpthread -lrt -ldl -L/opt/JavaIDE/x10-2.1.2-linux_x86/lib 
-lx10rt_sockets -Wl,--rpath 
-Wl,/opt/JavaIDE/x10-2.1.2-linux_x86/stdlib/lib -Wl,--rpath 
-Wl,/opt/JavaIDE/x10-2.1.2-linux_x86/lib -Wl,-export-dynamic
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_10'
      ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0]
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_11'
      ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0]
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_12'
      ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0]
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_13'
      ptxas info : Used 3 registers, 24+16 bytes smem, 65536 bytes cmem[0]
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_20'
      ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes cmem[2]
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_21'
      ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes cmem[2]
x10c++: ptxas info : Compiling entry function 'CUDATopology__closure__1' 
for 'sm_30'
      ptxas info : Used 4 registers, 56 bytes cmem[0], 65536 bytes cmem[2]


Thanks a lot :)

-- 
Richard Gomes
http://www.jquantlib.org/index.php/User:RichardGomes
twitter: frgomes

JQuantLib is a library for Quantitative Finance written in Java.
http://www.jquantlib.com/
twitter: jquantlib

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to