Hi Xinyu,
The problem you are seeing is that Mesos requires frameworks that want to
consume GPU resources to have the GPU_RESOURCES framework capability set.
Without this, the master will not send an offer to a framework if it
contains GPUs.
The choice to make frameworks explicitly opt-in to this GPU_RESOURCES
capability was to keep legacy frameworks from accidentally consuming a
bunch of non-GPU resources on any GPU-capable machines in a cluster (and
thus blocking GPU jobs from running). It's not that big a deal if all of
your nodes have GPUs, but in a mixed-node environment, it can be a big
problem.
If you run your job with mesos-execute, you can add the flag
--framework_capabilities=GPU_RESOURCES to the command line.
If you wrote your own framework in C++, you can set it via something like:
FrameworkInfo framework;
framework.add_capabilities()->set_type(
FrameworkInfo::Capability::GPU_RESOURCES);
GpuScheduler scheduler;
driver = new MesosSchedulerDriver(
&scheduler,
framework,
127.0.0.1:5050);
driver->run();
If you launch your job via marathon, support exists in 1.3 release that
just came out yesterday. You can download it here:
https://github.com/mesosphere/marathon/releases/tag/v1.1.3
Hope this helps!
Kevin