This is a more general question. I was attempting to run the code below on my
aforementioned OLD POS Cuda card. I get the following error:
Traceback (most recent call last):
File "histo.py", line 54, in <module>
mod_grid = compiler.SourceModule(grid_gpu)
File
"/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
line 283, in __init__
arch, code, cache_dir, include_dirs)
File
"/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
line 273, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File
"/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
line 137, in compile_plain
cmdline, stdout=stdout, stderr=stderr)
pycuda.driver.CompileError: nvcc said it demoted types in source code it
compiled--this is likely not what you want.
[command: nvcc --cubin -arch sm_11
-I/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda
kernel.cu]
[stderr:
ptxas /tmp/tmpxft_0000153b_00000000-2_kernel.ptx, line 95; warning : Double is
not supported. Demoting to float
]
Nowhere in the code is Double referenced. I'm guessing something behind the
scenes is trying to declare a Double on my behalf. Is there a way to turn this
off?
Thanks
Dominic
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Francisco Villaescusa Navarro
Sent: Friday, April 06, 2012 11:26 AM
To: Thomas Wiecki
Cc: [email protected]
Subject: Re: [PyCUDA] Histograms with PyCUDA
Thanks for all the suggestions!
Regarding removing sqrt: it seems that the code only gains about ~1%,
and you lose the capacity to easily define linear intervals...
I have tried with sqrt and sqrtf, but there is not difference in the
total time (or it is very small).
The code to find the histogram of an array with values between 0 and 1
should be something as:
import numpy as np
import time
import pycuda.driver as cuda
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
from pycuda.compiler import SourceModule
from pycuda import compiler
grid_gpu_template = """
__global__ void grid(float *values, int size, float *temp_grid)
{
unsigned int id = threadIdx.x;
int i,bin;
const uint interv = %(interv)s;
for(i=id;i<size;i+=blockDim.x){
bin=(int)(values[i]*interv);
if (bin==interv){
bin=interv-1;
}
temp_grid[id*interv+bin]+=1.0;
}
}
"""
reduction_gpu_template = """
__global__ void reduction(float *temp_grid, float *his)
{
unsigned int id = blockIdx.x*blockDim.x+threadIdx.x;
const uint interv = %(interv)s;
const uint threads = %(max_number_of_threads)s;
if(id<interv){
for(int i=0;i<threads;i++){
his[id]+=temp_grid[id+interv*i];
}
}
}
"""
number_of_points=100000000
max_number_of_threads=512
interv=1024
blocks=interv/max_number_of_threads
if interv%max_number_of_threads!=0:
blocks+=1
values=np.random.random(number_of_points).astype(np.float32)
grid_gpu = grid_gpu_template % {
'interv': interv,
}
mod_grid = compiler.SourceModule(grid_gpu)
grid = mod_grid.get_function("grid")
reduction_gpu = reduction_gpu_template % {
'interv': interv,
'max_number_of_threads': max_number_of_threads,
}
mod_redt = compiler.SourceModule(reduction_gpu)
redt = mod_redt.get_function("reduction")
values_gpu=gpuarray.to_gpu(values)
temp_grid_gpu
=gpuarray.zeros((max_number_of_threads,interv),dtype=np.float32)
hist=np.zeros(interv,dtype=np.float32)
hist_gpu=gpuarray.to_gpu(hist)
start=time.clock()*1e3
grid
(values_gpu
,np
.int32
(number_of_points
),temp_grid_gpu,grid=(1,1),block=(max_number_of_threads,1,1))
redt(temp_grid_gpu,hist_gpu,grid=(blocks,
1),block=(max_number_of_threads,1,1))
hist=hist_gpu.get()
print 'Time used to grid with GPU:',time.clock()*1e3-start,' ms'
start=time.clock()*1e3
bins_histo=np.linspace(0.0,1.0,interv+1)
hist_CPU=np.histogram(values,bins=bins_histo)[0]
print 'Time used to grid with CPU:',time.clock()*1e3-start,' ms'
print 'max difference between methods=',np.max(hist_CPU-hist)
################
Results:
Time used to grid with GPU: 680.0 ms
Time used to grid with CPU: 9320.0 ms
max difference between methods= 0.0
So it seems that with this algorithm we can't achieve factors larger
than ~15
Fran.
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda