I have a machine with 4 GPUs and want to use Mesos+Marathon to schedule the 
jobs to be run in the machine. Each job will use maximum 1 GPU and sharing 1 
GPU between small jobs would be ok.
I know Mesos does not directly support GPUs, but it seems I might use custom 
resources or attributes to do what I want. But how exactly should this be done?

If I use --attributes="hasGpu:true", would a job be sent to the machine when 
another job is already running in the machine (and only using 1 GPU)? I would 
say all jobs requesting a machine with a hasGpu attribute would be sent to the 
machine (as long as it has free CPU and memory resources). Then, if a job is 
sent to the machine when the 4 GPUs are already busy, the job will fail to 
start, right? Could then Marathon be used to re-send the job after some time, 
until it is accepted by the machine?

If I specify --resources="gpu(*):4", it is my understanding that once a job is 
sent to the machine, all 4 GPUs will become busy to the eyes of Mesos (even if 
this is not really true). If that is right, would this work-around work: 
specify 4 different resources: gpu:A, gpu:B, gpu:C and gpu:D; and use 
constraints in Marathon like this  "constraints": [["gpu", "LIKE", " [A-D]"]]?

Cheers

Reply via email to