> On Aug 28, 2020, at 5:35 AM, Karl Rupp <[email protected]> wrote:
> 
> Hi,
> 
>>   Since we cannot post issues (reported here 
>> https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith)
>>  here is my issue so I don't forget it.
>>   I think
>>  err  = WaitForCUDA();CHKERRCUDA(err);
>>  ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
>> should be changed to include WaitForCUDA() actually WaitForDevice() inside 
>> the PetscLogGpuTimeEnd().
>> Currently sometimes the WaitForCUDA() is missing in a few places resulting 
>> in bad timing.
>> Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be 
>> fixed.
>> The current model is a maintenance nightmare.
>> Does anyone see a problem with making this change?
> 
> I'm fine with this change, as the maintenance benefits outweigh the 
> performance cost for typical use cases.
> 
> I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). This 
> will ensure that no previous GPU kernel executions spill over into the timed 
> section.

  Might this incur an extra overhead checking the device? Or will it always be 
true that if there are no outstanding kernels it will not go to the GPU and the 
check will return immediately?

Barry
  
> 
> Best regards,
> Karli

Reply via email to