>Then, if a job is sent to the machine when the 4 GPUs are already busy, the job will fail to start, right? I not sure this. But if job fail, Marathon would retry as you said.
>a job is sent to the machine, all 4 GPUs will become busy If you specify your task only use 1 gpu in resources field. I think Mesos could continue provide offers which have gpu. And I remember Marathon constraints only could work with --attributes. On Fri, Jan 15, 2016 at 1:02 AM, <[email protected]> wrote: > I have a machine with 4 GPUs and want to use Mesos+Marathon to schedule > the jobs to be run in the machine. Each job will use maximum 1 GPU and > sharing 1 GPU between small jobs would be ok. > I know Mesos does not directly support GPUs, but it seems I might use > custom resources or attributes to do what I want. But how exactly should > this be done? > > If I use --attributes="hasGpu:true", would a job be sent to the machine > when another job is already running in the machine (and only using 1 GPU)? > I would say all jobs requesting a machine with a hasGpu attribute would be > sent to the machine (as long as it has free CPU and memory resources). > Then, if a job is sent to the machine when the 4 GPUs are already busy, the > job will fail to start, right? Could then Marathon be used to re-send the > job after some time, until it is accepted by the machine? > > If I specify --resources="gpu(*):4", it is my understanding that once a > job is sent to the machine, all 4 GPUs will become busy to the eyes of > Mesos (even if this is not really true). If that is right, would this > work-around work: specify 4 different resources: gpu:A, gpu:B, gpu:C and > gpu:D; and use constraints in Marathon like this "constraints": [["gpu", > "LIKE", " [A-D]"]]? > > Cheers > -- Best Regards, Haosdent Huang

