Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-24 Thread Bernd Schmidt
On 06/19/2015 03:45 PM, Jakub Jelinek wrote: If the loop remains in the IL (isn't optimized away as unreachable or isn't removed, e.g. as a non-loop - say if it contains a noreturn call), the flags on struct loop should be still there. For the loop clauses (reduction always, and

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-24 Thread Jakub Jelinek
On Wed, Jun 24, 2015 at 03:11:04PM +0200, Bernd Schmidt wrote: On 06/19/2015 03:45 PM, Jakub Jelinek wrote: If the loop remains in the IL (isn't optimized away as unreachable or isn't removed, e.g. as a non-loop - say if it contains a noreturn call), the flags on struct loop should be still

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Nathan Sidwell
On 06/22/15 11:18, Bernd Schmidt wrote: You can have a hint that it is desirable, but not a hint that it is correct (because passes in between may invalidate that). The OpenACC directives guarantee to the compiler that the program can be transformed into a parallel form. If we lose them early

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Nathan Sidwell
On 06/22/15 12:20, Jakub Jelinek wrote: OpenMP worksharing loop is just coordination between the threads in the team, which thread takes which subset of the loop's iterations, and optionally followed by a barrier. OpenMP simd loop is a loop that has certain properties guaranteed by the user

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Mon, 22 Jun 2015 16:24:56 +0200 Jakub Jelinek ja...@redhat.com wrote: On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote: One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Bernd Schmidt
On 06/22/2015 04:24 PM, Jakub Jelinek wrote: I don't understand why lowering the way you suggest helps here at all. In the proposed scheme, you essentially have whole function in e.g. worker-single or vector-single mode, which you need to be able to handle properly in any case, because users can

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 12:08:36PM -0400, Nathan Sidwell wrote: On 06/22/15 11:18, Bernd Schmidt wrote: You can have a hint that it is desirable, but not a hint that it is correct (because passes in between may invalidate that). The OpenACC directives guarantee to the compiler that the

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Mon, 22 Jun 2015 16:24:56 +0200 Jakub Jelinek ja...@redhat.com wrote: On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote: One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 06:48:10PM +0100, Julian Brown wrote: In vector-single or worker-single mode, divergence of threads within a warp or a CTA is controlled by broadcasting the controlling expression of conditional branches to the set of inactive threads, so each of those follows along

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Bernd Schmidt
On 06/19/2015 03:45 PM, Jakub Jelinek wrote: I actually believe having some optimization passes in between the ompexp and the lowering of the IR into the form PTX wants is highly desirable, the form with the worker-single or vector-single mode lowered will contain too complex CFG for many

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 03:59:57PM +0200, Bernd Schmidt wrote: On 06/19/2015 03:45 PM, Jakub Jelinek wrote: I actually believe having some optimization passes in between the ompexp and the lowering of the IR into the form PTX wants is highly desirable, the form with the worker-single or

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Jakub Jelinek
On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote: One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying hardware executes code: it's possible to draw up a scheme where OpenACC

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-22 Thread Julian Brown
On Fri, 19 Jun 2015 14:25:57 +0200 Jakub Jelinek ja...@redhat.com wrote: On Fri, Jun 19, 2015 at 11:53:14AM +0200, Bernd Schmidt wrote: On 05/28/2015 05:08 PM, Jakub Jelinek wrote: I understand it is more work, I'd just like to ask that when designing stuff for the OpenACC offloading

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-19 Thread Bernd Schmidt
On 06/19/2015 02:25 PM, Jakub Jelinek wrote: Emitting PTX specific code from current ompexp is highly undesirable of course, but I must say I'm not a big fan of keeping the GOMP_* gimple trees around for too long either, they've never meant to be used in low gimple, and even all the early

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-19 Thread Jakub Jelinek
On Fri, Jun 19, 2015 at 03:03:38PM +0200, Bernd Schmidt wrote: they are also very much OpenMP or OpenACC specific, rather than representing language neutral behavior, so there is a problem that you'd need M x N different expansions of those constructs, which is not really maintainable (M being

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-19 Thread Jakub Jelinek
On Fri, Jun 19, 2015 at 11:53:14AM +0200, Bernd Schmidt wrote: On 05/28/2015 05:08 PM, Jakub Jelinek wrote: I understand it is more work, I'd just like to ask that when designing stuff for the OpenACC offloading you (plural) try to take the other offloading devices and host fallback into

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-19 Thread Bernd Schmidt
On 05/28/2015 05:08 PM, Jakub Jelinek wrote: I understand it is more work, I'd just like to ask that when designing stuff for the OpenACC offloading you (plural) try to take the other offloading devices and host fallback into account. The problem is that many of the transformations we need to

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-03 Thread Julian Brown
On Thu, 28 May 2015 16:37:04 +0200 Richard Biener richard.guent...@gmail.com wrote: On Thu, May 28, 2015 at 4:06 PM, Julian Brown jul...@codesourcery.com wrote: For NVPTX, it is vitally important that the divergence of threads within a warp can be controlled: in particular we must be able

Re: [gomp4] Preserve NVPTX reconvergence points

2015-05-28 Thread Richard Biener
On Thu, May 28, 2015 at 4:06 PM, Julian Brown jul...@codesourcery.com wrote: For NVPTX, it is vitally important that the divergence of threads within a warp can be controlled: in particular we must be able to generate code that we know reconverges at a particular point. Unfortunately GCC's

[gomp4] Preserve NVPTX reconvergence points

2015-05-28 Thread Julian Brown
For NVPTX, it is vitally important that the divergence of threads within a warp can be controlled: in particular we must be able to generate code that we know reconverges at a particular point. Unfortunately GCC's middle-end optimisers can cause this property to be violated, which causes problems

Re: [gomp4] Preserve NVPTX reconvergence points

2015-05-28 Thread Jakub Jelinek
On Thu, May 28, 2015 at 03:06:35PM +0100, Julian Brown wrote: For NVPTX, it is vitally important that the divergence of threads within a warp can be controlled: in particular we must be able to generate code that we know reconverges at a particular point. Unfortunately GCC's middle-end

Re: [gomp4] Preserve NVPTX reconvergence points

2015-05-28 Thread Thomas Schwinge
Hi! On Thu, 28 May 2015 16:20:11 +0200, Jakub Jelinek ja...@redhat.com wrote: On Thu, May 28, 2015 at 03:06:35PM +0100, Julian Brown wrote: [...] I think the lowering of this already at ompexp time is premature Yes, we're aware of this wart. :-| I think much better would be to have a

Re: [gomp4] Preserve NVPTX reconvergence points

2015-05-28 Thread Jakub Jelinek
On Thu, May 28, 2015 at 04:49:43PM +0200, Thomas Schwinge wrote: I think much better would be to have a function attribute (or cgraph flag) that would be set for functions you want to compile this way (plus a targetm flag that the targets want to support it that way), plus a flag in loop