Hi all,

a correction:

I saw the correct output of nvidia-smi in the stdout file in the tasks work dir 
on the agent (that was the piece I didn’t get, reading helps!).

So I have to see why the framework doesn’t receive any offers.

Thanks,
Ben
 

> On 7. Jun 2020, at 15:01, Benjamin Wulff <benjamin.wulff...@ieee.org> wrote:
> 
> Hi all,
> 
> I found the gnu-support site in the docs (1) and tried the following command:
> 
> # mesos-execute --master=129.26.78.161:5050 --name=gpu-test 
> --command="nvidia-smi" --framework_capabilities="GPU_RESOURCES" 
> --resources="gpus:1”
> 
> ..and that gave the following output:
> 
> I0607 14:57:41.897706 56361 scheduler.cpp:189] Version: 1.9.0
> I0607 14:57:41.913520 56361 scheduler.cpp:342] Using default 'basic' HTTP 
> authenticatee
> I0607 14:57:41.913813 56367 scheduler.cpp:525] New master detected at 
> master@129.26.78.161 <mailto:master@129.26.78.161>:5050
> Subscribed with ID f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
> Submitted task 'gpu-test' to agent 'f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0'
> Received status update TASK_STARTING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_RUNNING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'gpu-test'
>   message: 'Command exited with status 0'
>   source: SOURCE_EXECUTOR
> 
> I did not see the output of nvidia-smi as I should have according to the 
> documentation.
> 
> I have attached the logs of master and agent.
> 
> Thanks,
> Ben
> <mesos-master.log.INFO.txt>
> <mesos-slave.node-01.root.log.INFO.2.log>
> 
> 
>> On 7. Jun 2020, at 02:43, Benjamin Mahler <bmah...@apache.org 
>> <mailto:bmah...@apache.org>> wrote:
>> 
>> Don't worry about that "Ignoring" message on the agent. When the framework 
>> information is updated, the master broadcasts it to the agents, and in this 
>> case the agent doesn't know about the framework since it has no tasks for 
>> it, and so it ignores the updated information.
>> 
>> I can't quite tell from the log snippet you provided. Assuming this is the 
>> only scheduler registered, it should receive offers for all the agents for 
>> the scheduler's roles (in this case, should just be the '*' role).
>> 
>> Some reasons offers might not be sent:
>> 
>> -Framework doesn't have capability required to be offered the agent (e.g. 
>> scheduler doesn't have GPU_RESOURCES when the agent has GPUs).
>> -Framework suppressed its role(s) (doesn't seem to be the case from the log 
>> snippet)
>> -The role has insufficient quota (e.g. if you have set a quota limit for 
>> that role, or if other roles have quota guarantees overcommitting the 
>> cluster)
>> -The agent's resources are reserved to a role.
>> 
>> Can you show us the scheduler code? Can you give us complete logs, along 
>> with results of the agent and master /state endpoints?
>> 
>> 
>> On Sat, Jun 6, 2020 at 8:00 AM Benjamin Wulff <benjamin.wulff...@ieee.org 
>> <mailto:benjamin.wulff...@ieee.org>> wrote:
>> So with logging_level set to INFO (and master and slave restarted) I noticed 
>> in /var/log/mesos.INFO on the agent the following line:
>> 
>> I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for 
>> framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not exist
>> 
>> That is indeed the ID of the framework I’d like to run my task. In the web 
>> UI the framework is listed. So why is the agent saying that it doesn’t 
>> exist? What is the semantic of this message?
>> 
>> On the master in /var/log/mesos-master.INFO the last relevant log lines 
>> (after that comes HTTP requests) are:
>> 
>> I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for 
>> /master/api/v1/scheduler from 172.30.0.8:41378 <http://172.30.0.8:41378/> 
>> with User-Agent='python-requests/2.23.0'
>> I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request 
>> for HTTP framework 'Go-Docker'
>> I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework 
>> 'Go-Docker' with checkpointing disabled and capabilities [  ]
>> I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework 
>> 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } 
>> suppressed
>> I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework 
>> 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } 
>> suppressed
>> I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework 
>> 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework 
>> 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for 
>> roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and 
>> cleared filters for roles {  } of framework 
>> 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework 
>> 2777de92-bc91-4e48-9960-bbab05694665-0000
>> 
>> From my novice perspective it seems the framework is registered..
>> 
>> Thanks,
>> Ben
>> 
>> 
>> > On 6. Jun 2020, at 13:40, Marc Roos <m.r...@f1-outsourcing.eu 
>> > <mailto:m.r...@f1-outsourcing.eu>> wrote:
>> > 
>> > 
>> > 
>> > 
>> > You already put these on debug?
>> > 
>> > [@ ]# cat /etc/mesos-master/logging_level
>> > WARNING
>> > [@ ]# cat /etc/mesos-slave/logging_level
>> > WARNING
>> > 
>> > 
>> > 
>> > 
>> > -----Original Message-----
>> > From: Benjamin Wulff [mailto:benjamin.wulff...@ieee.org 
>> > <mailto:benjamin.wulff...@ieee.org>] 
>> > Sent: zaterdag 6 juni 2020 13:36
>> > To: user@mesos.apache.org <mailto:user@mesos.apache.org>
>> > Subject: No offers are being made -- how to debug Mesos?
>> > 
>> > Hi all,
>> > 
>> > I’m in the process of setting up my first Mesos cluster with 1x master 
>> > and 3x slaves on CentOS 8.
>> > 
>> > So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
>> > one of the compute nodes. Mesos-master communicates with ZK and becomes 
>> > leader. Then I started memos-slave on the compute node and can see in 
>> > the log that it registers at the master with the correct resources 
>> > reported. The agent and its resources are also displayed in the web UI 
>> > of the master. So is the framework that I want to use.
>> > 
>> > The crux is that no tasks I schedule in the framework are executed. And 
>> > I suppose this is because the framework never receives an offer. I can 
>> > see in the web UI that no offers are made and that all resources remain 
>> > idle.
>> > 
>> > Now, I’m new to Mesos and I don’t really have an idea how to debug my 
>> > setup at this point. 
>> > 
>> > There is a page called ‘Debugging with the new CLI’ in the 
>> > documentation but it only explains how to configure  the CLI command. 
>> > 
>> > Any directions how to debug in my situation in general or on how to use 
>> > the CLI for debugging would be highly welcome! :)
>> > 
>> > Thanks and best regards,
>> > Ben
>> > 
>> > 
>> > 
>> 
> 

Reply via email to