[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-09-13 Thread aldyh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #11 from Aldy Hernandez  ---
Author: aldyh
Date: Wed Sep 13 16:36:54 2017
New Revision: 252328

URL: https://gcc.gnu.org/viewcvs?rev=252328&root=gcc&view=rev
Log:
Fix diff_type in expand_oacc_for char iter_type

2017-08-07  Tom de Vries  

PR middle-end/78266
* omp-expand.c (expand_oacc_for): Ensure diff_type is large enough.

* testsuite/libgomp.oacc-c-c++-common/vprop-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vprop.c: Remove xfail.

Added:
branches/range-gen2/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop-2.c
Modified:
branches/range-gen2/gcc/ChangeLog
branches/range-gen2/gcc/omp-expand.c
branches/range-gen2/libgomp/ChangeLog
branches/range-gen2/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

Tom de Vries  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Tom de Vries  ---
patch with test-case committed.

marking resolved-fixed.

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #9 from Tom de Vries  ---
patch with test-suite (In reply to cesar from comment #8)
> Because num_gangs exceeds largest unsigned value that can be represented by
> the induction variable.

I think what you're trying to say here is that the program is not correct. I
haven't found anything in the standard to suggest that this is incorrect.

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread cesar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #8 from cesar at gcc dot gnu.org ---
Because num_gangs exceeds largest unsigned value that can be represented by the
induction variable.

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #7 from Tom de Vries  ---
(In reply to cesar from comment #6)
> I'm not sure that solution is correct.

Why ?

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread cesar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #6 from cesar at gcc dot gnu.org ---
I'm not sure that solution is correct.  A better solution would be to report an
error/warning stating that num_workers exceeds the size of the induction
variable. Also, in the case that user doesn't specify num_gangs and the type of
the induction variable is less than integer_node_type, then hard-code num_gangs
to 255 or something small so that the runtime doesn't assign num_gangs that
generate bogus results.

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #5 from Tom de Vries  ---
Author: vries
Date: Mon Aug  7 17:06:11 2017
New Revision: 250925

URL: https://gcc.gnu.org/viewcvs?rev=250925&root=gcc&view=rev
Log:
Fix diff_type in expand_oacc_for char iter_type

2017-08-07  Tom de Vries  

PR middle-end/78266
* omp-expand.c (expand_oacc_for): Ensure diff_type is large enough.

* testsuite/libgomp.oacc-c-c++-common/vprop-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/vprop.c: Remove xfail.

Added:
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/omp-expand.c
trunk/libgomp/ChangeLog
trunk/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #4 from Tom de Vries  ---
F.i., we generate:
...
  _41 = GOACC_DIM_SIZE (0);
  _29 = (signed char) _41;
...
where _41 is 256.

When folding in forwprop2, we fold _29 to '0':
...
gimple_simplified to _29 = 0;
...

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #3 from Tom de Vries  ---
Minimal example:
...
int
main ()
{

  #pragma acc parallel num_gangs(256)
  {
#pragma acc loop gang
for (unsigned char j = 0; j < 5; j++)
  ;
  }

  return 0;
}
...

We generate an unconditional trap, thanks to pass_isolate_erroneous_paths:
...
.entry main$_omp_fn$0 (.param .u64 %in_ar0)
{
  .reg .u64 %ar0;
  ld.param.u64 %ar0,[%in_ar0];
  .reg .pred %r23;
  {
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r23,%x,0;
  }
  @ %r23 bra $L2;
  trap;
  $L2:
}
...

AFAICT, the problem is that the logic that distributes the iterations over the
gangs uses signed char, while the number of gangs can be larger than what is
representable.

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

--- Comment #2 from Tom de Vries  ---
Reproduced it by mapping the outer loop to gang, and setting num_gangs to 640.

[Bug middle-end/78266] broken openacc loop partitioning on nvptx offloading targets

2017-08-06 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78266

Tom de Vries  changed:

   What|Removed |Added

 CC||vries at gcc dot gnu.org

--- Comment #1 from Tom de Vries  ---
The test-case XPASSes for me for trunk r250889 on Quadro 1200m with 375.66
driver and cuda 7.5:
...
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  (test for excess errors)
XPASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O0  execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O2  (test for excess errors)
XPASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vprop.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0  -O2  execution test
...