The CUDA tests are hanging/timing-out more often now. For eg:
And I did see some build where they didn't get killed due to timeout. For eg:
This is on M2090. I can see them getting stuck on es.mcs [when I run manually
- and check with nvidia-smi]
When i run these tests manually on GTX1050 (frog.mcs) - they zip through..
Any idea why they get stuck on M2090? [more frequently than random hangs..]
no, I don't know why this is the case. All my local tests finish
quickly, too. I noticed last summer that there is higher startup
overhead on the M2090 than on more recent GPUs, but that was in the
seconds regime, not in minutes.
Are the tests run in parallel? If so, then maybe the parallel
initialization of GPUs is slowing things down.