Hi Xinyu,

The problem you are seeing is that Mesos requires frameworks that want to
consume GPU resources to have the GPU_RESOURCES framework capability set.
Without this, the master will not send an offer to a framework if it
contains GPUs.

The choice to make frameworks explicitly opt-in to this GPU_RESOURCES
capability was to keep legacy frameworks from accidentally consuming a
bunch of non-GPU resources on any GPU-capable machines in a cluster (and
thus blocking GPU jobs from running). It's not that big a deal if all of
your nodes have GPUs, but in a mixed-node environment, it can be a big
problem.

If you run your job with mesos-execute, you can add the flag
--framework_capabilities=GPU_RESOURCES to the command line.

If you wrote your own framework in C++, you can set it via something like:

  FrameworkInfo framework;
  framework.add_capabilities()->set_type(
        FrameworkInfo::Capability::GPU_RESOURCES);

  GpuScheduler scheduler;

  driver = new MesosSchedulerDriver(
      &scheduler,
      framework,
      127.0.0.1:5050);

  driver->run();


If you launch your job via marathon, support exists in 1.3 release that
just came out yesterday. You can download it here:
https://github.com/mesosphere/marathon/releases/tag/v1.1.3

Hope this helps!

Kevin

Reply via email to