[PyOpenCL] New version and prefix scan problems

Tomasz Rybak Fri, 25 Jan 2013 16:40:40 -0800

Hello.
I can see that there is already new version (2013.1) in docs,
marked "in development". I would like for it not to be released
before fixing problems with parallel prefix scan.


Problems with scan are only visible on APU Loveland. They do not
occur on ION, nor on GTX 460. I do not have access to machine
with NVIDIA CC 3.x so I cannot test prefix scan there.
I first encountered it in August, and mentioned them in email
to the list from 2012-08-08 ("Python3 test failures").
Only recently I had some time and eagerness to look closer into them.
Tests still fail on recent git version c31944d1e81a.

Failing tests are now in test_algorithm.py, in third group (marked
scan-related, starting in line 418). I'll describe my observations
of test_scan function.
My APU has 2 Computing Units. GenericScanKernel chooses
k_group_size to be 4096, max_scan_wg_size to be 256,
and max_intervals to 6.

The first error occurs when there is enough work to fill two Computing
Units - in my case 2**12+5. It looks like there is problem with passing
partial result from computations occurring on fist CU to the second one.
Prefix sum is computed correctly on the second half of the array but
starting with the wrong point. I have printed interval_results array
and I have observed that error (difference between the correct value
of the interval's first element and actual one) is not the value
of any of the elements of interval_results, nor it is difference
between interval_results elements. On the other hand difference
between real and wanted value is similar (i.e. in the same range)
to the difference between interval_results[4] and interval_results[3].
In the test I have run just now the error is 10724571 and
the difference is 10719275; I am not sure if this is relevant though.

Errors are not repeatable - sometimes they occur for small arrays
(e.g. for 2**12+5) sometimes for larger ones (test I have run
right now failed for ExclusiveScan of size 2**24+5). The tests'
failures also depend on order of tests - after changing order of
elements of array scan_test_counts I got failures for different
sizes, but always for sizes larger than 2**12. It might be
some race condition, but I do not understand new scan fully
and cannot point my finger at one place.

If there is any additional test I can perform please let me know.
I'll try to investigate it further but I am not sure whether
it'll work.

Best regards.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

[PyOpenCL] New version and prefix scan problems

Reply via email to