Hi all, I found the gnu-support site in the docs (1) and tried the following command: # mesos-execute --master=129.26.78.161:5050 --name=gpu-test --command="nvidia-smi" --framework_capabilities="GPU_RESOURCES" --resources="gpus:1” ..and that gave the following output: I0607 14:57:41.897706 56361 scheduler.cpp:189] Version: 1.9.0 I0607 14:57:41.913520 56361 scheduler.cpp:342] Using default 'basic' HTTP authenticatee I0607 14:57:41.913813 56367 scheduler.cpp:525] New master detected at master@129.26.78.161:5050 Subscribed with ID f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 Submitted task 'gpu-test' to agent 'f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0' Received status update TASK_STARTING for task 'gpu-test' source: SOURCE_EXECUTOR Received status update TASK_RUNNING for task 'gpu-test' source: SOURCE_EXECUTOR Received status update TASK_FINISHED for task 'gpu-test' message: 'Command exited with status 0' source: SOURCE_EXECUTOR I did not see the output of nvidia-smi as I should have according to the documentation. I have attached the logs of master and agent. Thanks, Ben |
I0607 14:57:41.926017 54652 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 129.26.78.161:45512 I0607 14:57:41.927448 54652 master.cpp:2670] Received subscription request for HTTP framework 'mesos-execute instance' I0607 14:57:41.927587 54652 master.cpp:2742] Subscribing framework 'mesos-execute instance' with checkpointing disabled and capabilities [ RESERVATION_REFINEMENT, TASK_KILLING_STATE, REVOCABLE_RESOURCES, PARTITION_AWARE, GPU_RESOURCES ] I0607 14:57:41.928453 54652 master.cpp:10847] Adding framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) with roles { } suppressed I0607 14:57:41.928616 54651 hierarchical.cpp:605] Added framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 I0607 14:57:41.929093 54655 master.cpp:10432] Sending offers [ f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-O4 ] to framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) I0607 14:57:41.930851 54657 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 129.26.78.161:45510 I0607 14:57:41.931084 54657 master.cpp:12724] Removing offer f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-O4 I0607 14:57:41.931234 54657 master.cpp:4741] Processing ACCEPT call for offers: [ f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-O4 ] on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) for framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) I0607 14:57:41.931506 54657 master.cpp:4302] Adding task gpu-test with resources gpus(allocated: *):1 of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) I0607 14:57:41.931610 54657 master.cpp:5720] Launching task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) with resources [{"allocation_info":{"role":"*"},"name":"gpus","scalar":{"value":1.0},"type":"SCALAR"}] on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) on new executor I0607 14:57:42.169665 54640 master.cpp:8985] Status update TASK_STARTING (Status UUID: e483ba5a-afe1-4306-b994-accbfedaae52) for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 from agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) I0607 14:57:42.169793 54640 master.cpp:9042] Forwarding status update TASK_STARTING (Status UUID: e483ba5a-afe1-4306-b994-accbfedaae52) for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 I0607 14:57:42.169997 54640 master.cpp:12073] Updating the state of task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state: TASK_STARTING, status update state: TASK_STARTING) I0607 14:57:42.211457 54642 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 129.26.78.161:45510 I0607 14:57:42.211583 54642 master.cpp:6695] Processing ACKNOWLEDGE call for status e483ba5a-afe1-4306-b994-accbfedaae52 for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 I0607 14:57:42.212878 54659 master.cpp:8985] Status update TASK_RUNNING (Status UUID: a870507f-82a6-4389-ac3c-064b386abfcf) for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 from agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) I0607 14:57:42.212954 54659 master.cpp:9042] Forwarding status update TASK_RUNNING (Status UUID: a870507f-82a6-4389-ac3c-064b386abfcf) for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 I0607 14:57:42.213131 54659 master.cpp:12073] Updating the state of task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state: TASK_RUNNING, status update state: TASK_RUNNING) I0607 14:57:42.254524 54651 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 129.26.78.161:45510 I0607 14:57:42.254655 54651 master.cpp:6695] Processing ACKNOWLEDGE call for status a870507f-82a6-4389-ac3c-064b386abfcf for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 I0607 14:57:42.465921 54644 master.cpp:8985] Status update TASK_FINISHED (Status UUID: 49c81e1b-9a8f-42a5-8629-edfc89e466bf) for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 from agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) I0607 14:57:42.466002 54644 master.cpp:9042] Forwarding status update TASK_FINISHED (Status UUID: 49c81e1b-9a8f-42a5-8629-edfc89e466bf) for task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 I0607 14:57:42.466174 54644 master.cpp:12073] Updating the state of task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0607 14:57:42.467525 54643 master.cpp:1412] Framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) disconnected I0607 14:57:42.467586 54643 master.cpp:3362] Deactivating framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) I0607 14:57:42.467626 54643 master.cpp:3339] Disconnecting framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) I0607 14:57:42.467648 54643 master.cpp:1427] Giving framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) 0ns to failover I0607 14:57:42.467701 54658 hierarchical.cpp:711] Deactivated framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 I0607 14:57:42.467837 54643 master.cpp:10224] Framework failover timeout, removing framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) I0607 14:57:42.467862 54643 master.cpp:11223] Removing framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (mesos-execute instance) I0607 14:57:42.468037 54643 master.cpp:12073] Updating the state of task gpu-test of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 (latest state: TASK_FINISHED, status update state: TASK_KILLED) I0607 14:57:42.468077 54643 master.cpp:12171] Removing task gpu-test with resources gpus(allocated: *):1 of framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 on agent f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0 at slave(1)@10.116.24.18:5051 (node-01) I0607 14:57:42.468312 54645 hierarchical.cpp:655] Removed framework f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005 I0607 14:57:44.343035 54650 http.cpp:1115] HTTP GET for /master/state?jsonp=angular.callbacks._65 from 10.116.60.121:57573 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:76.0) Gecko/20100101 Firefox/76.0'
mesos-slave.node-01.root.log.INFO.2.log
Description: Binary data
|