Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-13 Thread Cesar Philippidis
On 08/13/2018 08:08 AM, Tom de Vries wrote: > On 08/13/2018 04:54 PM, Cesar Philippidis wrote: >> Going >> forward, how would you like to proceed with the nvptx BE vector length >> changes. > > Do you have a branch available on github containing the patch series > you've submitted? Yes,

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-13 Thread Tom de Vries
On 08/13/2018 04:54 PM, Cesar Philippidis wrote: > Going > forward, how would you like to proceed with the nvptx BE vector length > changes. Do you have a branch available on github containing the patch series you've submitted? Thanks, - Tom

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-13 Thread Cesar Philippidis
On 08/13/2018 05:04 AM, Tom de Vries wrote: > On 08/10/2018 08:39 PM, Cesar Philippidis wrote: >> is that I modified the default value for vectors as follows >> >> +int vectors = default_dim_p[GOMP_DIM_VECTOR] >> + ? 0 : dims[GOMP_DIM_VECTOR]; >> >> Technically, trunk only

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-13 Thread Tom de Vries
he code will become too convoluted. Btw, I've also noticed that we don't handle a too high GOMP_OPENACC_DIM[GOMP_DIM_WORKER], I've added a TODO comment for this. > If you want, I can resubmit a patch without that change though> > 0001-nvptx-Use-CUDA-driver-API-to-select-default-runtime-

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-10 Thread Cesar Philippidis
On 08/08/2018 08:19 AM, Tom de Vries wrote: > On Wed, Aug 08, 2018 at 07:09:16AM -0700, Cesar Philippidis wrote: >> On 08/07/2018 06:52 AM, Cesar Philippidis wrote: Thanks for review. This version should address all of the following remarks. However, one thing to note ... >> [nvptx] Use CUDA

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-08 Thread Tom de Vries
On Wed, Aug 08, 2018 at 07:09:16AM -0700, Cesar Philippidis wrote: > On 08/07/2018 06:52 AM, Cesar Philippidis wrote: > > > I attached an updated version of the CUDA driver patch, although I > > haven't rebased it against your changes yet. It still needs to be tested > > against CUDA 5.5 using

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-08 Thread Cesar Philippidis
On 08/07/2018 06:52 AM, Cesar Philippidis wrote: > I attached an updated version of the CUDA driver patch, although I > haven't rebased it against your changes yet. It still needs to be tested > against CUDA 5.5 using the systems/Nvidia's cuda.h. But I wanted to give > you an update. > > Does

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-07 Thread Cesar Philippidis
be tested against CUDA 5.5 using the systems/Nvidia's cuda.h. But I wanted to give you an update. Does this patch look OK, at least after testing competes? I removed the tests for CUDA_ONE_CALL_MAYBE_NULL, because the newer CUDA API isn't supported in the older drivers. Cesar >From 7fc093da173543b43e1

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-07 Thread Tom de Vries
On 08/01/2018 12:18 PM, Tom de Vries wrote: > I think we need to add and handle: > ... > CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize) > ... > I realized that the patch I posted introducing CUDA_ONE_CALL_MAYBE_NULL was incomplete, and needed to use the weak attribute in case of

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-04 Thread Tom de Vries
On 08/03/2018 05:37 PM, Cesar Philippidis wrote: >> But I still see no rationale why blocks is used here, and I wonder >> whether something like num_gangs = grids * 64 would give similar results. > My original intent was to keep the load proportional to the block size. > So, in the case were a

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-03 Thread Cesar Philippidis
On 08/03/2018 08:22 AM, Tom de Vries wrote: > On 08/01/2018 09:11 PM, Cesar Philippidis wrote: >> On 08/01/2018 07:12 AM, Tom de Vries wrote: >> >> + gangs = grids * (blocks / warp_size); > > So, we launch with gangs == grids * workers ? Is that intentional? Yes.

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-03 Thread Tom de Vries
On 08/01/2018 09:11 PM, Cesar Philippidis wrote: > On 08/01/2018 07:12 AM, Tom de Vries wrote: > > + gangs = grids * (blocks / warp_size); So, we launch with gangs == grids * workers ? Is that intentional? >>> >>> Yes. At least that's what I've been using in og8. Setting

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Cesar Philippidis
On 08/01/2018 07:12 AM, Tom de Vries wrote: +gangs = grids * (blocks / warp_size); >>> >>> So, we launch with gangs == grids * workers ? Is that intentional? >> >> Yes. At least that's what I've been using in og8. Setting num_gangs = >> grids alone caused significant slow downs. >> >

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Tom de Vries
On 08/01/2018 04:01 PM, Cesar Philippidis wrote: > On 08/01/2018 03:18 AM, Tom de Vries wrote: >> On 07/31/2018 04:58 PM, Cesar Philippidis wrote: >>> The attached patch teaches libgomp how to use the CUDA thread occupancy >>> calculator built into the CUDA driver. Despite both being based off the

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Cesar Philippidis
On 08/01/2018 03:18 AM, Tom de Vries wrote: > On 07/31/2018 04:58 PM, Cesar Philippidis wrote: >> The attached patch teaches libgomp how to use the CUDA thread occupancy >> calculator built into the CUDA driver. Despite both being based off the >> CUDA thread occupancy spreadsheet distributed with

Re: [PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-08-01 Thread Tom de Vries
On 07/31/2018 04:58 PM, Cesar Philippidis wrote: > The attached patch teaches libgomp how to use the CUDA thread occupancy > calculator built into the CUDA driver. Despite both being based off the > CUDA thread occupancy spreadsheet distributed with CUDA, the built in > occupancy calculator

[PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-07-31 Thread Cesar Philippidis
The attached patch teaches libgomp how to use the CUDA thread occupancy calculator built into the CUDA driver. Despite both being based off the CUDA thread occupancy spreadsheet distributed with CUDA, the built in occupancy calculator differs from the occupancy calculator in og8 in two key ways.