Hi,

The CUDA tests are hanging/timing-out more often now. For eg:
http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/06/examples_next_arch-cuda-double_es.log

And I did see some build where they didn't get killed due to timeout. For eg:
http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/05/examples_next_arch-cuda-double_es.log

This is on M2090.  I can see them getting stuck on es.mcs [when I run manually 
- and check with nvidia-smi]

When i run these tests manually on GTX1050 (frog.mcs) - they zip through..
Any idea why they get stuck on M2090? [more frequently than random hangs..]

no, I don't know why this is the case. All my local tests finish quickly, too. I noticed last summer that there is higher startup overhead on the M2090 than on more recent GPUs, but that was in the seconds regime, not in minutes.

Are the tests run in parallel? If so, then maybe the parallel initialization of GPUs is slowing things down.

Best regards,
Karli

Reply via email to