Sai Teja Ranuva created MESOS-8038: -------------------------------------- Summary: Collect failed: Requested 1 but 0 available Key: MESOS-8038 URL: https://issues.apache.org/jira/browse/MESOS-8038 Project: Mesos Issue Type: Bug Components: allocation, gpu Affects Versions: 1.4.0 Reporter: Sai Teja Ranuva
I was running a job which uses GPUs. It runs fine most of the time. But occasionally I see the following message in the mesos log. "Collect failed: Requested 1 but only 0 available" Followed by executor getting killed and the tasks getting lost. This happens even before the the job starts. A little search in the code base points me to something related to GPU resource being the probable cause. There is no deterministic way that this can be reproduced. It happens occasionally. I have attached the slave log for the issue. Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave. -- This message was sent by Atlassian JIRA (v6.4.14#64029)