On Mon, Apr 9, 2012 at 5:46 PM, Pazzula, Dominic J
<dominic.j.pazz...@citi.com> wrote:
> This is a more general question.  I was attempting to run the code below on 
> my aforementioned OLD POS Cuda card.  I get the following error:
>
> Traceback (most recent call last):
>  File "histo.py", line 54, in <module>
>    mod_grid = compiler.SourceModule(grid_gpu)
>  File 
> "/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
>  line 283, in __init__
>    arch, code, cache_dir, include_dirs)
>  File 
> "/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
>  line 273, in compile
>    return compile_plain(source, options, keep, nvcc, cache_dir)
>  File 
> "/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
>  line 137, in compile_plain
>    cmdline, stdout=stdout, stderr=stderr)
> pycuda.driver.CompileError: nvcc said it demoted types in source code it 
> compiled--this is likely not what you want.
> [command: nvcc --cubin -arch sm_11 
> -I/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda
>  kernel.cu]
> [stderr:
> ptxas /tmp/tmpxft_0000153b_00000000-2_kernel.ptx, line 95; warning : Double 
> is not supported. Demoting to float
> ]
>
> Nowhere in the code is Double referenced.  I'm guessing something behind the 
> scenes is trying to declare a Double on my behalf.  Is there a way to turn 
> this off?
>
> Thanks
> Dominic

I think this is fixed in the most recent git-development version.

Thomas

> -----Original Message-----
> From: pycuda-boun...@tiker.net [mailto:pycuda-boun...@tiker.net] On Behalf Of 
> Francisco Villaescusa Navarro
> Sent: Friday, April 06, 2012 11:26 AM
> To: Thomas Wiecki
> Cc: pycuda@tiker.net
> Subject: Re: [PyCUDA] Histograms with PyCUDA
>
> Thanks for all the suggestions!
>
> Regarding removing sqrt: it seems that the code only gains about ~1%,
> and you lose the capacity to easily define linear intervals...
>
> I have tried with sqrt and sqrtf, but there is not difference in the
> total time (or it is very small).
>
> The code to find the histogram of an array with values between 0 and 1
> should be something as:
>
> import numpy as np
> import time
> import pycuda.driver as cuda
> import pycuda.autoinit
> import pycuda.gpuarray as gpuarray
> import pycuda.cumath as cumath
> from pycuda.compiler import SourceModule
> from pycuda import compiler
>
> grid_gpu_template = """
> __global__ void grid(float *values, int size, float *temp_grid)
> {
>     unsigned int id = threadIdx.x;
>     int i,bin;
>     const uint interv = %(interv)s;
>
>     for(i=id;i<size;i+=blockDim.x){
>         bin=(int)(values[i]*interv);
>         if (bin==interv){
>            bin=interv-1;
>         }
>         temp_grid[id*interv+bin]+=1.0;
>     }
> }
> """
>
> reduction_gpu_template = """
> __global__ void reduction(float *temp_grid, float *his)
> {
>     unsigned int id = blockIdx.x*blockDim.x+threadIdx.x;
>     const uint interv = %(interv)s;
>     const uint threads = %(max_number_of_threads)s;
>
>     if(id<interv){
>         for(int i=0;i<threads;i++){
>             his[id]+=temp_grid[id+interv*i];
>         }
>     }
> }
> """
>
> number_of_points=100000000
> max_number_of_threads=512
> interv=1024
>
> blocks=interv/max_number_of_threads
> if interv%max_number_of_threads!=0:
>     blocks+=1
>
> values=np.random.random(number_of_points).astype(np.float32)
>
> grid_gpu = grid_gpu_template % {
>     'interv': interv,
> }
> mod_grid = compiler.SourceModule(grid_gpu)
> grid = mod_grid.get_function("grid")
>
> reduction_gpu = reduction_gpu_template % {
>     'interv': interv,
>     'max_number_of_threads': max_number_of_threads,
> }
> mod_redt = compiler.SourceModule(reduction_gpu)
> redt = mod_redt.get_function("reduction")
>
> values_gpu=gpuarray.to_gpu(values)
> temp_grid_gpu
> =gpuarray.zeros((max_number_of_threads,interv),dtype=np.float32)
> hist=np.zeros(interv,dtype=np.float32)
> hist_gpu=gpuarray.to_gpu(hist)
>
> start=time.clock()*1e3
> grid
> (values_gpu
> ,np
> .int32
> (number_of_points
> ),temp_grid_gpu,grid=(1,1),block=(max_number_of_threads,1,1))
> redt(temp_grid_gpu,hist_gpu,grid=(blocks,
> 1),block=(max_number_of_threads,1,1))
> hist=hist_gpu.get()
> print 'Time used to grid with GPU:',time.clock()*1e3-start,' ms'
>
>
> start=time.clock()*1e3
> bins_histo=np.linspace(0.0,1.0,interv+1)
> hist_CPU=np.histogram(values,bins=bins_histo)[0]
> print 'Time used to grid with CPU:',time.clock()*1e3-start,' ms'
>
> print 'max difference between methods=',np.max(hist_CPU-hist)
>
>
> ################
>
> Results:
>
> Time used to grid with GPU: 680.0  ms
> Time used to grid with CPU: 9320.0  ms
> max difference between methods= 0.0
>
> So it seems that with this algorithm we can't achieve factors larger
> than ~15
>
> Fran.
>
>
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to