Agh, I misunderstood your code, so what I said was probably wrong. Sorry. On Dec 19, 2012 5:45 PM, "David Mertens" <[email protected]> wrote:
> I suspect that you have a case of numerical overflow. Can you transfer the > results back from the device and see how many of the elements in the array > are inf? > > David > On Dec 19, 2012 11:01 AM, "Simone Riva" <[email protected]> wrote: > >> I've written this test code: >> >> Where I've inserted the call to the opencl prg in a loop. >> But after about 150 iterations I experiencing a dramatic loss >> of performance, and the velocity became too slow. >> >> What's the better way for calling an opencl program in a python for, like >> the example bellow, without any loss of performance. >> >> That's the output: >> the two loop do exactly the same operation. >> >> start .... >> Prg : 0.256917 >> >> start b .... >> Prg b: 1.663486 >> >> >> Tnx. >> >> The code >> ---------------------------------------------------------------------- >> >> import pyopencl as cl >> import pyopencl.array as cla >> import numpy >> import numpy.linalg as la >> import time >> >> lnn = 100000 >> szz = lnn*32 >> >> a = numpy.random.rand(szz,3).astype(numpy.float32) >> b = numpy.random.rand(szz,3).astype(numpy.float32) >> c = numpy.random.rand(szz,3).astype(numpy.float32) >> >> ctx = cl.create_some_context() >> queue = cl.CommandQueue(ctx) >> queue2 = cl.CommandQueue(ctx) >> >> mf = cl.mem_flags >> >> a_array = cla.to_device( queue , a ) >> b_array = cla.to_device( queue , b ) >> >> dest_array = cla.Array( queue , (szz,3) , numpy.float32 ) >> dest_array_b = cla.Array( queue , (szz,3) , numpy.float32 ) >> >> prg_b = cl.Program(ctx, """ >> __kernel void sum_b(__global const float *a, >> __global const float *b, __global float *c) >> { >> int i = get_global_id(0); >> >> float m = sqrt( pown( a[3*i] , 2 ) + pown( a[3*i+1] , 2 ) + pown( >> a[3*i+2] , 2 ) ) ; >> >> c[3*i] = i*10.0f + m ; >> c[3*i+1] = i*10.0f + 1 ; >> c[3*i+2] = i*10.0f + 2 ; >> >> } >> """).build() >> >> >> >> rep = 400 >> >> print("\nstart ....") >> >> ta = time.time() >> for fooo in range(rep): >> prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data , >> dest_array.data ) >> tb = time.time() >> >> print( "Prg : %f" % (tb - ta) ) >> >> #dest_array.get( queue , c ) >> #print dest_array >> >> print("\nstart b ....") >> >> taa = time.time() >> for foo in range(rep): >> prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data , >> dest_array_b.data ) >> tbb = time.time() >> >> print( "Prg b: %f" % (tbb - taa) ) >> >> #dest_array_b.get( queue , c ) >> #print ( dest_array_b - dest_array ) >> >> _______________________________________________ >> PyOpenCL mailing list >> [email protected] >> http://lists.tiker.net/listinfo/pyopencl >> >>
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
