> On Aug 28, 2020, at 10:26 AM, Stefano Zampini <[email protected]> > wrote: > > > >> On Aug 28, 2020, at 5:18 PM, Barry Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >>> On Aug 28, 2020, at 5:35 AM, Karl Rupp <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi, >>> >>>> Since we cannot post issues (reported here >>>> https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith >>>> >>>> <https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith>) >>>> here is my issue so I don't forget it. >>>> I think >>>> err = WaitForCUDA();CHKERRCUDA(err); >>>> ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr); >>>> should be changed to include WaitForCUDA() actually WaitForDevice() inside >>>> the PetscLogGpuTimeEnd(). >>>> Currently sometimes the WaitForCUDA() is missing in a few places resulting >>>> in bad timing. >>>> Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be >>>> fixed. >>>> The current model is a maintenance nightmare. >>>> Does anyone see a problem with making this change? >>> >>> I'm fine with this change, as the maintenance benefits outweigh the >>> performance cost for typical use cases. >>> >>> I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). This >>> will ensure that no previous GPU kernel executions spill over into the >>> timed section.
Karl, When synchronization is turned on the precious GPU kernels should always have their own WaitForDevice(), so are you concerned about buggy code that does not include WaitForDevice? >> >> Might this incur an extra overhead checking the device? Or will it always >> be true that if there are no outstanding kernels it will not go to the GPU >> and the check will return immediately? > > If we want to have a two barrier model, I propose we log the timing for > waiting at the first barrier separately. >> >> Barry >> >>> >>> Best regards, >>> Karli >
