This revision was automatically updated to reflect the committed changes.
tra marked 2 inline comments as done.
Closed by commit rC337587: [CUDA] Provide integer SIMD functions for CUDA-9.2
(authored by tra, committed by ).
Changed prior to commit:
bkramer accepted this revision.
bkramer added a comment.
This revision is now accepted and ready to land.
lg
https://reviews.llvm.org/D49274
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
tra marked 2 inline comments as done.
tra added a comment.
Ben, PTAL.
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:1080
+ unsigned int r;
+ asm("vabsdiff2.u32.u32.u32.sat %0,%1,%2,0;" : "=r"(r) : "r"(__a), "r"(__b));
+ return r;
bkramer
tra updated this revision to Diff 156397.
tra added a comment.
Fixed the issues pointed out by bkramer@.
Apparently. sat does not matter for vabsdiff instruction with unsigned operands.
My tests were also missing __vabsssN.
https://reviews.llvm.org/D49274
Files:
tra updated this revision to Diff 156386.
tra added a comment.
Fixed inline asm syntax.
Added workaround for the bug in __vmaxs2() discovered during testing().
I've got set of tests for these functions that I'll add to test-suite shortly.
AFAICT this implementation matches nvidia's bit-to-bit.
tra added a comment.
I'm in the middle of writing the tests for these as it's very easy to mess
things up. I'll update the patch once I run it through the tests.
Another problem with the patch in the current form is that these instructions
apparently do not accept immediate arguments. PTX is a
bkramer accepted this revision.
bkramer added inline comments.
This revision is now accepted and ready to land.
Comment at: clang/lib/Headers/__clang_cuda_device_functions.h:1080
+ unsigned int r;
+ asm("vabsdiff2.u32.u32.u32.sat %0,%1,%2,0;" : "=r"(r) : "r"(__a), "r"(__b));
+
tra created this revision.
tra added reviewers: jlebar, bkramer.
Herald added subscribers: bixia, sanjoy.
CUDA-9.2 made all integer SIMD functions into compiler builtins,
so clang no longer has access to the implementation of these
functions in either headers of libdevice and has to provide
its