Agh, I misunderstood your code, so what I said was probably wrong. Sorry.
On Dec 19, 2012 5:45 PM, "David Mertens" <[email protected]> wrote:

> I suspect that you have a case of numerical overflow. Can you transfer the
> results back from the device and see how many of the elements in the array
> are inf?
>
> David
> On Dec 19, 2012 11:01 AM, "Simone Riva" <[email protected]> wrote:
>
>> I've written this test code:
>>
>> Where I've inserted the call to the opencl prg in a loop.
>> But after about 150 iterations I experiencing a dramatic loss
>> of performance, and the velocity became too slow.
>>
>> What's the better way for calling an opencl program in a python for, like
>> the example bellow, without any loss of performance.
>>
>> That's the output:
>> the two loop do exactly the same operation.
>>
>> start ....
>> Prg  : 0.256917
>>
>> start b ....
>> Prg b: 1.663486
>>
>>
>> Tnx.
>>
>> The code
>> ----------------------------------------------------------------------
>>
>> import pyopencl as cl
>> import pyopencl.array as cla
>> import numpy
>> import numpy.linalg as la
>> import time
>>
>> lnn = 100000
>> szz = lnn*32
>>
>> a = numpy.random.rand(szz,3).astype(numpy.float32)
>> b = numpy.random.rand(szz,3).astype(numpy.float32)
>> c = numpy.random.rand(szz,3).astype(numpy.float32)
>>
>> ctx = cl.create_some_context()
>> queue = cl.CommandQueue(ctx)
>> queue2 = cl.CommandQueue(ctx)
>>
>> mf = cl.mem_flags
>>
>> a_array = cla.to_device( queue , a )
>> b_array = cla.to_device( queue , b )
>>
>> dest_array = cla.Array( queue , (szz,3) , numpy.float32 )
>> dest_array_b = cla.Array( queue , (szz,3) , numpy.float32 )
>>
>> prg_b = cl.Program(ctx, """
>>     __kernel void sum_b(__global const float *a,
>>         __global const float *b, __global float *c)
>>     {
>>       int i = get_global_id(0);
>>
>>       float m = sqrt( pown( a[3*i] , 2 )  + pown( a[3*i+1] , 2 )  + pown(
>> a[3*i+2] , 2 ) ) ;
>>
>>       c[3*i] = i*10.0f  + m ;
>>       c[3*i+1] = i*10.0f + 1 ;
>>       c[3*i+2] = i*10.0f + 2 ;
>>
>>     }
>>     """).build()
>>
>>
>>
>> rep = 400
>>
>> print("\nstart ....")
>>
>> ta = time.time()
>> for fooo in range(rep):
>>   prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
>> dest_array.data )
>> tb = time.time()
>>
>> print( "Prg  : %f" % (tb - ta) )
>>
>> #dest_array.get( queue , c )
>> #print dest_array
>>
>> print("\nstart b ....")
>>
>> taa = time.time()
>> for foo in range(rep):
>>   prg_b.sum_b(queue, (szz,), None, a_array.data , b_array.data ,
>> dest_array_b.data )
>> tbb = time.time()
>>
>> print( "Prg b: %f" % (tbb - taa) )
>>
>> #dest_array_b.get( queue , c )
>> #print ( dest_array_b - dest_array )
>>
>> _______________________________________________
>> PyOpenCL mailing list
>> [email protected]
>> http://lists.tiker.net/listinfo/pyopencl
>>
>>
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to