| Hi all, I found the gnu-support site in the docs (1) and tried the following command: # mesos-execute --master=129.26.78.161:5050 --name=gpu-test --command="nvidia-smi" --framework_capabilities="GPU_RESOURCES" --resources="gpus:1” ..and that gave the following output: I0607 14:57:41.897706 56361 scheduler.cpp:189] Version: 1.9.0 I0607 14:57:41.913520 56361 scheduler.cpp:342] Using default 'basic' HTTP authenticatee I0607 14:57:41.913813 56367 scheduler.cpp:525] New master detected at [email protected]:5050 Subscribed with ID f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 Submitted task 'gpu-test' to agent 'f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0' Received status update TASK_STARTING for task 'gpu-test' source: SOURCE_EXECUTOR Received status update TASK_RUNNING for task 'gpu-test' source: SOURCE_EXECUTOR Received status update TASK_FINISHED for task 'gpu-test' message: 'Command exited with status 0' source: SOURCE_EXECUTOR I did not see the output of nvidia-smi as I should have according to the documentation. I have attached the logs of master and agent. Thanks, Ben |
I0607 14:57:41.926017 54652 http.cpp:1115] HTTP POST for
/master/api/v1/scheduler from 129.26.78.161:45512
I0607 14:57:41.927448 54652 master.cpp:2670] Received subscription request for
HTTP framework 'mesos-execute instance'
I0607 14:57:41.927587 54652 master.cpp:2742] Subscribing framework
'mesos-execute instance' with checkpointing disabled and capabilities [
RESERVATION_REFINEMENT, TASK_KILLING_STATE, REVOCABLE_RESOURCES,
PARTITION_AWARE, GPU_RESOURCES ]
I0607 14:57:41.928453 54652 master.cpp:10847] Adding framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) with roles {
} suppressed
I0607 14:57:41.928616 54651 hierarchical.cpp:605] Added framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
I0607 14:57:41.929093 54655 master.cpp:10432] Sending offers [
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-O4 ] to framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance)
I0607 14:57:41.930851 54657 http.cpp:1115] HTTP POST for
/master/api/v1/scheduler from 129.26.78.161:45510
I0607 14:57:41.931084 54657 master.cpp:12724] Removing offer
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-O4
I0607 14:57:41.931234 54657 master.cpp:4741] Processing ACCEPT call for offers:
[ f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-O4 ] on agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01)
for framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance)
I0607 14:57:41.931506 54657 master.cpp:4302] Adding task gpu-test with
resources gpus(allocated: *):1 of framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) on agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01)
I0607 14:57:41.931610 54657 master.cpp:5720] Launching task gpu-test of
framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance)
with resources
[{"allocation_info":{"role":"*"},"name":"gpus","scalar":{"value":1.0},"type":"SCALAR"}]
on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051
(node-01) on new executor
I0607 14:57:42.169665 54640 master.cpp:8985] Status update TASK_STARTING
(Status UUID: e483ba5a-afe1-4306-b994-accbfedaae52) for task gpu-test of
framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 from agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01)
I0607 14:57:42.169793 54640 master.cpp:9042] Forwarding status update
TASK_STARTING (Status UUID: e483ba5a-afe1-4306-b994-accbfedaae52) for task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
I0607 14:57:42.169997 54640 master.cpp:12073] Updating the state of task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state:
TASK_STARTING, status update state: TASK_STARTING)
I0607 14:57:42.211457 54642 http.cpp:1115] HTTP POST for
/master/api/v1/scheduler from 129.26.78.161:45510
I0607 14:57:42.211583 54642 master.cpp:6695] Processing ACKNOWLEDGE call for
status e483ba5a-afe1-4306-b994-accbfedaae52 for task gpu-test of framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) on agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0
I0607 14:57:42.212878 54659 master.cpp:8985] Status update TASK_RUNNING (Status
UUID: a870507f-82a6-4389-ac3c-064b386abfcf) for task gpu-test of framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 from agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01)
I0607 14:57:42.212954 54659 master.cpp:9042] Forwarding status update
TASK_RUNNING (Status UUID: a870507f-82a6-4389-ac3c-064b386abfcf) for task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
I0607 14:57:42.213131 54659 master.cpp:12073] Updating the state of task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state:
TASK_RUNNING, status update state: TASK_RUNNING)
I0607 14:57:42.254524 54651 http.cpp:1115] HTTP POST for
/master/api/v1/scheduler from 129.26.78.161:45510
I0607 14:57:42.254655 54651 master.cpp:6695] Processing ACKNOWLEDGE call for
status a870507f-82a6-4389-ac3c-064b386abfcf for task gpu-test of framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) on agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0
I0607 14:57:42.465921 54644 master.cpp:8985] Status update TASK_FINISHED
(Status UUID: 49c81e1b-9a8f-42a5-8629-edfc89e466bf) for task gpu-test of
framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 from agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01)
I0607 14:57:42.466002 54644 master.cpp:9042] Forwarding status update
TASK_FINISHED (Status UUID: 49c81e1b-9a8f-42a5-8629-edfc89e466bf) for task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
I0607 14:57:42.466174 54644 master.cpp:12073] Updating the state of task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state:
TASK_FINISHED, status update state: TASK_FINISHED)
I0607 14:57:42.467525 54643 master.cpp:1412] Framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) disconnected
I0607 14:57:42.467586 54643 master.cpp:3362] Deactivating framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance)
I0607 14:57:42.467626 54643 master.cpp:3339] Disconnecting framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance)
I0607 14:57:42.467648 54643 master.cpp:1427] Giving framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) 0ns to
failover
I0607 14:57:42.467701 54658 hierarchical.cpp:711] Deactivated framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
I0607 14:57:42.467837 54643 master.cpp:10224] Framework failover timeout,
removing framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute
instance)
I0607 14:57:42.467862 54643 master.cpp:11223] Removing framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance)
I0607 14:57:42.468037 54643 master.cpp:12073] Updating the state of task
gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state:
TASK_FINISHED, status update state: TASK_KILLED)
I0607 14:57:42.468077 54643 master.cpp:12171] Removing task gpu-test with
resources gpus(allocated: *):1 of framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 on agent
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01)
I0607 14:57:42.468312 54645 hierarchical.cpp:655] Removed framework
f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
I0607 14:57:44.343035 54650 http.cpp:1115] HTTP GET for
/master/state?jsonp=angular.callbacks._65 from 10.116.60.121:57573 with
User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:76.0)
Gecko/20100101 Firefox/76.0'
mesos-slave.node-01.root.log.INFO.2.log
Description: Binary data
|

