[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #11 from Aldy Hernandez --- Author: aldyh Date: Wed Sep 13 16:36:54 2017 New Revision: 252328 URL: https://gcc.gnu.org/viewcvs?rev=252328&root=gcc&view=rev Log: Fix diff_type in expand_oacc_for char iter_type 2017-08-07 Tom de Vries PR middle-end/78266 * omp-expand.c (expand_oacc_for): Ensure diff_type is large enough. * testsuite/libgomp.oacc-c-c++-common/vprop-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/vprop.c: Remove xfail. Added: branches/range-gen2/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop-2.c Modified: branches/range-gen2/gcc/ChangeLog branches/range-gen2/gcc/omp-expand.c branches/range-gen2/libgomp/ChangeLog branches/range-gen2/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #10 from Tom de Vries --- patch with test-case committed. marking resolved-fixed.
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #9 from Tom de Vries --- patch with test-suite (In reply to cesar from comment #8) > Because num_gangs exceeds largest unsigned value that can be represented by > the induction variable. I think what you're trying to say here is that the program is not correct. I haven't found anything in the standard to suggest that this is incorrect.
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #8 from cesar at gcc dot gnu.org --- Because num_gangs exceeds largest unsigned value that can be represented by the induction variable.
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #7 from Tom de Vries --- (In reply to cesar from comment #6) > I'm not sure that solution is correct. Why ?
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #6 from cesar at gcc dot gnu.org --- I'm not sure that solution is correct. A better solution would be to report an error/warning stating that num_workers exceeds the size of the induction variable. Also, in the case that user doesn't specify num_gangs and the type of the induction variable is less than integer_node_type, then hard-code num_gangs to 255 or something small so that the runtime doesn't assign num_gangs that generate bogus results.
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #5 from Tom de Vries --- Author: vries Date: Mon Aug 7 17:06:11 2017 New Revision: 250925 URL: https://gcc.gnu.org/viewcvs?rev=250925&root=gcc&view=rev Log: Fix diff_type in expand_oacc_for char iter_type 2017-08-07 Tom de Vries PR middle-end/78266 * omp-expand.c (expand_oacc_for): Ensure diff_type is large enough. * testsuite/libgomp.oacc-c-c++-common/vprop-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/vprop.c: Remove xfail. Added: trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/omp-expand.c trunk/libgomp/ChangeLog trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #4 from Tom de Vries --- F.i., we generate: ... _41 = GOACC_DIM_SIZE (0); _29 = (signed char) _41; ... where _41 is 256. When folding in forwprop2, we fold _29 to '0': ... gimple_simplified to _29 = 0; ...
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #3 from Tom de Vries --- Minimal example: ... int main () { #pragma acc parallel num_gangs(256) { #pragma acc loop gang for (unsigned char j = 0; j < 5; j++) ; } return 0; } ... We generate an unconditional trap, thanks to pass_isolate_erroneous_paths: ... .entry main$_omp_fn$0 (.param .u64 %in_ar0) { .reg .u64 %ar0; ld.param.u64 %ar0,[%in_ar0]; .reg .pred %r23; { .reg .u32 %x; mov.u32 %x,%tid.x; setp.ne.u32 %r23,%x,0; } @ %r23 bra $L2; trap; $L2: } ... AFAICT, the problem is that the logic that distributes the iterations over the gangs uses signed char, while the number of gangs can be larger than what is representable.
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 --- Comment #2 from Tom de Vries --- Reproduced it by mapping the outer loop to gang, and setting num_gangs to 640.
[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266 Tom de Vries changed: What|Removed |Added CC||vries at gcc dot gnu.org --- Comment #1 from Tom de Vries --- The test-case XPASSes for me for trunk r250889 on Quadro 1200m with 375.66 driver and cuda 7.5: ... PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O0 (test for excess errors) XPASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O2 (test for excess errors) XPASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O2 execution test ...