Re: Threading model of mesos API (C++)
Thanks for the responses, guys. That link of the 'detailed description' will be handy - I've not come across that before. I do now have another question though! Aren't these two a contradiction; Alex; you launch a task, before the method returns (say you do some blocking stuff after, like sync update zookeeper), you might get a statusUpdate() callback. Ben; Methods will not be invoked concurrently, and each method must complete before the next is called. ?? Jim On 10 June 2015 at 02:22, Benjamin Mahler benjamin.mah...@gmail.com wrote: If that's really what you're seeing, it is a bug and a very surprising one, so please provide evidence :) See the detailed description here: http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html The scheduler driver will serially invoke methods on your Scheduler implementation. Methods will not be invoked concurrently, and each method must complete before the next is called. So, we recommend that you don't block inside the callbacks. Otherwise, you're blocking the driver as well and your own ability to continue processing callbacks. On Tue, Jun 9, 2015 at 8:58 AM, James Vanns jvanns@gmail.com wrote: Hi. I'm toying with the mesos scheduler (C++) API and running into unexpected race conditions. I have *not* synchronised access to attributes of my Scheduler-derived class. Is the mesos library code threaded and network communication asynchronous? What it *looks like* I'm seeing is my statusUpdate() callback being executed before the return of resourceOffers(). Naturally I call driver-launchTasks() inside resourceOffers(). This is intermittent but generally triggered by tasks that report status changes very quickly; eg. a task that fails instantly. Can anyone point me in the right direction of any online API docs that explain how callbacks are invoked? Distributed over a pool of worker threads? Also are the state transitions documented? Eg. mesos::TASK_STAGING - mesos::TASK_STARTING - etc. Cheers, Jim -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic
Re: Threading model of mesos API (C++)
Jim, Let me prototype something small today. After reading my scheduler (in c++) i do have comments and synchronization on some state vars, but it might have to do with a more complex async code base I manage. I'll get back to you. - alex On Wed, Jun 10, 2015 at 6:15 AM, James Vanns jvanns@gmail.com wrote: Thanks for the responses, guys. That link of the 'detailed description' will be handy - I've not come across that before. I do now have another question though! Aren't these two a contradiction; Alex; you launch a task, before the method returns (say you do some blocking stuff after, like sync update zookeeper), you might get a statusUpdate() callback. Ben; Methods will not be invoked concurrently, and each method must complete before the next is called. ?? Jim On 10 June 2015 at 02:22, Benjamin Mahler benjamin.mah...@gmail.com wrote: If that's really what you're seeing, it is a bug and a very surprising one, so please provide evidence :) See the detailed description here: http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html The scheduler driver will serially invoke methods on your Scheduler implementation. Methods will not be invoked concurrently, and each method must complete before the next is called. So, we recommend that you don't block inside the callbacks. Otherwise, you're blocking the driver as well and your own ability to continue processing callbacks. On Tue, Jun 9, 2015 at 8:58 AM, James Vanns jvanns@gmail.com wrote: Hi. I'm toying with the mesos scheduler (C++) API and running into unexpected race conditions. I have *not* synchronised access to attributes of my Scheduler-derived class. Is the mesos library code threaded and network communication asynchronous? What it *looks like* I'm seeing is my statusUpdate() callback being executed before the return of resourceOffers(). Naturally I call driver-launchTasks() inside resourceOffers(). This is intermittent but generally triggered by tasks that report status changes very quickly; eg. a task that fails instantly. Can anyone point me in the right direction of any online API docs that explain how callbacks are invoked? Distributed over a pool of worker threads? Also are the state transitions documented? Eg. mesos::TASK_STAGING - mesos::TASK_STARTING - etc. Cheers, Jim -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic
Re: Can Mesos master offer resources to multiple frameworks simultaneously?
I'll try to answer these questions. 1. Currently, the only language you can use is C++. You can workaround this by writing a proxy in c++ that delegates the calls to, say, python scripts. See http://mesos.apache.org/documentation/latest/allocation-module/ for more details. 2. The default allocator is called dominant resource fairness since it tries to distribute resources fairly between active frameworks. This means it will offer all available resources to all frameworks, but each framework will get only a certain share. For more information I encourage you to take a look at the DRF paper. 3. Offered and not declined resources are considered to be used, therefore they can't be re-offered until freed. Hope this helps. On 10 Jun 2015 7:53 am, Qian Zhang zhq527...@gmail.com wrote: Thanks Adam, this is very helpful! I have a few more questions: 1. For the pluggable allocator modules, can I write my own allocator in any programming language (e.g., Python, Go, etc)? 2. For the default DRF allocator, when it offer resources to a framework, will it offer all the available resources (resources not being used by any frameworks) to it? Or just part of the available resources? 3. If there are multiple frameworks and the default DRF allocator will only offer resources to a single framework at a time, then that means framework 2 has to wait for framework 1 until framework 1 makes its placement decision?
Re: Threading model of mesos API (C++)
You are a star, Alex. Thank you :) Jim On 10 June 2015 at 15:15, Alexander Gallego agall...@concord.io wrote: Jim, Let me prototype something small today. After reading my scheduler (in c++) i do have comments and synchronization on some state vars, but it might have to do with a more complex async code base I manage. I'll get back to you. - alex On Wed, Jun 10, 2015 at 6:15 AM, James Vanns jvanns@gmail.com wrote: Thanks for the responses, guys. That link of the 'detailed description' will be handy - I've not come across that before. I do now have another question though! Aren't these two a contradiction; Alex; you launch a task, before the method returns (say you do some blocking stuff after, like sync update zookeeper), you might get a statusUpdate() callback. Ben; Methods will not be invoked concurrently, and each method must complete before the next is called. ?? Jim On 10 June 2015 at 02:22, Benjamin Mahler benjamin.mah...@gmail.com wrote: If that's really what you're seeing, it is a bug and a very surprising one, so please provide evidence :) See the detailed description here: http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html The scheduler driver will serially invoke methods on your Scheduler implementation. Methods will not be invoked concurrently, and each method must complete before the next is called. So, we recommend that you don't block inside the callbacks. Otherwise, you're blocking the driver as well and your own ability to continue processing callbacks. On Tue, Jun 9, 2015 at 8:58 AM, James Vanns jvanns@gmail.com wrote: Hi. I'm toying with the mesos scheduler (C++) API and running into unexpected race conditions. I have *not* synchronised access to attributes of my Scheduler-derived class. Is the mesos library code threaded and network communication asynchronous? What it *looks like* I'm seeing is my statusUpdate() callback being executed before the return of resourceOffers(). Naturally I call driver-launchTasks() inside resourceOffers(). This is intermittent but generally triggered by tasks that report status changes very quickly; eg. a task that fails instantly. Can anyone point me in the right direction of any online API docs that explain how callbacks are invoked? Distributed over a pool of worker threads? Also are the state transitions documented? Eg. mesos::TASK_STAGING - mesos::TASK_STARTING - etc. Cheers, Jim -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic
Re: Debugging framework registration from inside docker
On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote: Hi. When attempting to run my scheduler inside a docker container in --net=bridge mode it never receives acknowledgement or a reply to that request. However, it works fine in --net=host mode. It does not listen on any port as a service so does not expose any. The scheduler receives the mesos master (leader) from zookeeper fine but fails to register the framework with that master. It just loops trying to do so - the master sees the registration but deactivates it immediately as apparently it disconnects. It doesn't disconnect but is obviously unreachable. I see the reason for this in the sendto() and the master log file -- because the internal docker bridge IP is included in the POST and perhaps that is how the master is trying to talk back to the requesting framework?? Inside the container is this; tcp0 0 0.0.0.0:44431 0.0.0.0:* LISTEN 1/scheduler This is not my code! I'm at a loss where to go from here. Anyone got any further suggestions to fix this? You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact that you are on a virtual Docker interface.
Debugging framework registration from inside docker
Hi. When attempting to run my scheduler inside a docker container in --net=bridge mode it never receives acknowledgement or a reply to that request. However, it works fine in --net=host mode. It does not listen on any port as a service so does not expose any. The scheduler receives the mesos master (leader) from zookeeper fine but fails to register the framework with that master. It just loops trying to do so - the master sees the registration but deactivates it immediately as apparently it disconnects. It doesn't disconnect but is obviously unreachable. I see the reason for this in the sendto() and the master log file -- because the internal docker bridge IP is included in the POST and perhaps that is how the master is trying to talk back to the requesting framework?? Inside the container is this; tcp0 0 0.0.0.0:44431 0.0.0.0:* LISTEN 1/scheduler This is not my code! I'm at a loss where to go from here. Anyone got any further suggestions to fix this? Cheers, Jim -- Senior Code Pig Industrial Light Magic
Apply Now #MesosCon Conference Diversity Scholarship
Hi Mesos friends, We need your help promoting the #MesosCon diversity scholarship. #MesosCon, the annual open source #ApacheMesos developers conference, is now accepting applications for their diversity scholarship. It provides financial assistance for women (cis and trans), genderqueer people, people of color, and people with disabilities. Scholarship recipients will receive a free registration ticket, can request support for travel and hotel, and will automatically be enrolled in our buddy system program. To apply and learn more, click here http://events.linuxfoundation.org/events/mesoscon/attend/scholarship. To help promote via twitter click here https://twitter.com/apachemesos/status/608327682569433088. Thank you for your support, Kiersten Gaffney Planning Committee Member, #MesosCon Manager of Events, Mesosphere -- Kiersten Gaffney Manager of Events kiers...@mesosphere.io 415-559-3771
MesosCon 2015 Lightning Talk CFP now open
Good news, everyone: We’ve expanded the MesosCon program (http://mesoscon.org) to add lightning talks: 5-minute presentations for speakers to introduce a project they’re working on, or share an idea related to Mesos. Lightning talks will take place during lunchtime of the conference, which takes place August 20-21st, 2015 in Seattle WA. The form to propose a lightning talk is available here: https://docs.google.com/forms/d/1raB-IqA4gi0elYPBHmh5lB17jQdCQWrEeMslQGeihbs/viewform When preparing your proposal, keep in mind: * 5 minutes presentations will be enforced by a time-keeper. The 5-minute presentation includes any time you may wish for QA, so use your time wisely. * Slides are allowed, but not required. We will have a laptop on stage with slides queued up for those that submit them in advance; using your own laptop and transitioning to use it will be included in your 5 minutes so use your time wisely! * We encourage submissions that may have previously been shared as full proposals * Only one lightning talk may be submitted per person * Lightning talk speakers will be expected to purchase full tickets to the conference The CFP opens Wednesday, June 10th 2015 and will close July 15th; speakers will decided by members of the program committee and contacted by July 22nd regarding the status of their proposal. Good luck with your proposals! Hope to see you all at MesosCon. Dave
Re: Can Mesos master offer resources to multiple frameworks simultaneously?
Thanks Alex. For 1. I understand currently the only choice is C++. However, as Adam mentioned, true pluggable allocator modules (MESOS-2160 https://issues.apache.org/jira/browse/MESOS-2160) are landing in Mesos 0.23, so at that time, I assume we will have more choices, right? For 2 and 3, my understanding is Mesos allocator will partition all the available resources into multiple subsets, and there is no overlap between these subsets (i.e., a single resource can only be in one subset), and then offer these subsets to multiple frameworks (e.g., offer subset1 to framework1, offer subset2 to framework2, and so on), and it is up to each framework's scheduler to determine if it accept the resource to launch task or reject it. In this way, each framework's scheduler can actually make scheduling decision independently since they will never compete for the same resource. If my understanding is correct, then I have one more question: 4. What if it takes very long time (e.g., mins or hours) for a framework's scheduler to make the scheduling decision? Does that mean during this long period, the resources offered to this framework will not be used by any other frameworks? Is there a timeout for the framework's scheduler to make the scheduling decision? So when the timeout is reached, the resources offered to it will be revoked by Mesos allocator and can be offered to another framework.