> On Aug 28, 2020, at 5:18 PM, Barry Smith <[email protected]> wrote:
> 
> 
> 
>> On Aug 28, 2020, at 5:35 AM, Karl Rupp <[email protected]> wrote:
>> 
>> Hi,
>> 
>>>  Since we cannot post issues (reported here 
>>> https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith)
>>>  here is my issue so I don't forget it.
>>>  I think
>>> err  = WaitForCUDA();CHKERRCUDA(err);
>>> ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
>>> should be changed to include WaitForCUDA() actually WaitForDevice() inside 
>>> the PetscLogGpuTimeEnd().
>>> Currently sometimes the WaitForCUDA() is missing in a few places resulting 
>>> in bad timing.
>>> Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be 
>>> fixed.
>>> The current model is a maintenance nightmare.
>>> Does anyone see a problem with making this change?
>> 
>> I'm fine with this change, as the maintenance benefits outweigh the 
>> performance cost for typical use cases.
>> 
>> I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). This 
>> will ensure that no previous GPU kernel executions spill over into the timed 
>> section.
> 
>  Might this incur an extra overhead checking the device? Or will it always be 
> true that if there are no outstanding kernels it will not go to the GPU and 
> the check will return immediately?

If we want to have a two barrier model, I propose we log the timing for waiting 
at the first barrier separately.
> 
> Barry
> 
>> 
>> Best regards,
>> Karli

Reply via email to