> On Aug 28, 2020, at 5:18 PM, Barry Smith <[email protected]> wrote: > > > >> On Aug 28, 2020, at 5:35 AM, Karl Rupp <[email protected]> wrote: >> >> Hi, >> >>> Since we cannot post issues (reported here >>> https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith) >>> here is my issue so I don't forget it. >>> I think >>> err = WaitForCUDA();CHKERRCUDA(err); >>> ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr); >>> should be changed to include WaitForCUDA() actually WaitForDevice() inside >>> the PetscLogGpuTimeEnd(). >>> Currently sometimes the WaitForCUDA() is missing in a few places resulting >>> in bad timing. >>> Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be >>> fixed. >>> The current model is a maintenance nightmare. >>> Does anyone see a problem with making this change? >> >> I'm fine with this change, as the maintenance benefits outweigh the >> performance cost for typical use cases. >> >> I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). This >> will ensure that no previous GPU kernel executions spill over into the timed >> section. > > Might this incur an extra overhead checking the device? Or will it always be > true that if there are no outstanding kernels it will not go to the GPU and > the check will return immediately?
If we want to have a two barrier model, I propose we log the timing for waiting at the first barrier separately. > > Barry > >> >> Best regards, >> Karli
