Re: [petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

Barry Smith Fri, 28 Aug 2020 09:44:39 -0700


> On Aug 28, 2020, at 10:26 AM, Stefano Zampini <[email protected]> 
> wrote:
> 
> 
> 
>> On Aug 28, 2020, at 5:18 PM, Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>>> On Aug 28, 2020, at 5:35 AM, Karl Rupp <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi,
>>> 
>>>>  Since we cannot post issues (reported here 
>>>> https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith
>>>>  
>>>> <https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith>)
>>>>  here is my issue so I don't forget it.
>>>>  I think
>>>> err  = WaitForCUDA();CHKERRCUDA(err);
>>>> ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
>>>> should be changed to include WaitForCUDA() actually WaitForDevice() inside 
>>>> the PetscLogGpuTimeEnd().
>>>> Currently sometimes the WaitForCUDA() is missing in a few places resulting 
>>>> in bad timing.
>>>> Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be 
>>>> fixed.
>>>> The current model is a maintenance nightmare.
>>>> Does anyone see a problem with making this change?
>>> 
>>> I'm fine with this change, as the maintenance benefits outweigh the 
>>> performance cost for typical use cases.
>>> 
>>> I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). This 
>>> will ensure that no previous GPU kernel executions spill over into the 
>>> timed section.


  Karl,

   When synchronization is turned on the precious GPU kernels should always 
have their own WaitForDevice(), so are you concerned about buggy code that does 
not include WaitForDevice?

>> 
>>  Might this incur an extra overhead checking the device? Or will it always 
>> be true that if there are no outstanding kernels it will not go to the GPU 
>> and the check will return immediately?
> 
> If we want to have a two barrier model, I propose we log the timing for 
> waiting at the first barrier separately.
>> 
>> Barry
>> 
>>> 
>>> Best regards,
>>> Karli
>

Re: [petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

Reply via email to