Sai Teja Ranuva created MESOS-8038:
--------------------------------------

             Summary: Collect failed: Requested 1 but 0 available
                 Key: MESOS-8038
                 URL: https://issues.apache.org/jira/browse/MESOS-8038
             Project: Mesos
          Issue Type: Bug
          Components: allocation, gpu
    Affects Versions: 1.4.0
            Reporter: Sai Teja Ranuva


I was running a job which uses GPUs. It runs fine most of the time. 
But occasionally I see the following message in the mesos log.
"Collect failed: Requested 1 but only 0 available"
Followed by executor getting killed and the tasks getting lost. This happens 
even before the the job starts. A little search in the code base points me to 
something related to GPU resource being the probable cause.

There is no deterministic way that this can be reproduced. It happens 
occasionally.
I have attached the slave log for the issue.

Using 1.4.0 Mesos Master and 1.4.0 Mesos Slave.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to