Re: Debugging framework registration from inside docker
For what exactly? I thought that was for slave-master communication? There is no problem there. Or are you suggesting that from inside the running container I set at least LIBPROCESS_IP to the host IP rather than the IP of eth0 the container sees? Won't that screw with the docker bridge routing? This doesn't quite make sense. I have other network connections inside this container and those channels are established and communicating fine. It's just with the mesos master for some reason. Just to be clear; * The running process is a scheduling framework * It does not listen for any inbound connection requests * It, of course, does attempt an outbound connection to the zookeeper to get the MM (this works) * It then attempts to establish a connection with the MM (this also works) * When the MM sends a response, it fails - it effectively tries to send the response back to the private/internal docker IP where my scheduler is running. * This problem disappears when run with --net=host TCPDump never shows any inbound traffic; IP 172.17.1.197.55182 172.20.121.193.5050 ... Therefore there is never any ACK# that corresponds with the SEQ# and these are just re-transmissions. I think! Jim On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com wrote: On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote: Hi. When attempting to run my scheduler inside a docker container in --net=bridge mode it never receives acknowledgement or a reply to that request. However, it works fine in --net=host mode. It does not listen on any port as a service so does not expose any. The scheduler receives the mesos master (leader) from zookeeper fine but fails to register the framework with that master. It just loops trying to do so - the master sees the registration but deactivates it immediately as apparently it disconnects. It doesn't disconnect but is obviously unreachable. I see the reason for this in the sendto() and the master log file -- because the internal docker bridge IP is included in the POST and perhaps that is how the master is trying to talk back to the requesting framework?? Inside the container is this; tcp0 0 0.0.0.0:44431 0.0.0.0:* LISTEN 1/scheduler This is not my code! I'm at a loss where to go from here. Anyone got any further suggestions to fix this? You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact that you are on a virtual Docker interface. -- -- Senior Code Pig Industrial Light Magic
Re: Debugging framework registration from inside docker
Looks like I share the same symptoms as this 'marathon inside container' problem; https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion I guess that sheds some light on the subject ;) On 11 June 2015 at 09:43, James Vanns jvanns@gmail.com wrote: For what exactly? I thought that was for slave-master communication? There is no problem there. Or are you suggesting that from inside the running container I set at least LIBPROCESS_IP to the host IP rather than the IP of eth0 the container sees? Won't that screw with the docker bridge routing? This doesn't quite make sense. I have other network connections inside this container and those channels are established and communicating fine. It's just with the mesos master for some reason. Just to be clear; * The running process is a scheduling framework * It does not listen for any inbound connection requests * It, of course, does attempt an outbound connection to the zookeeper to get the MM (this works) * It then attempts to establish a connection with the MM (this also works) * When the MM sends a response, it fails - it effectively tries to send the response back to the private/internal docker IP where my scheduler is running. * This problem disappears when run with --net=host TCPDump never shows any inbound traffic; IP 172.17.1.197.55182 172.20.121.193.5050 ... Therefore there is never any ACK# that corresponds with the SEQ# and these are just re-transmissions. I think! Jim On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com wrote: On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote: Hi. When attempting to run my scheduler inside a docker container in --net=bridge mode it never receives acknowledgement or a reply to that request. However, it works fine in --net=host mode. It does not listen on any port as a service so does not expose any. The scheduler receives the mesos master (leader) from zookeeper fine but fails to register the framework with that master. It just loops trying to do so - the master sees the registration but deactivates it immediately as apparently it disconnects. It doesn't disconnect but is obviously unreachable. I see the reason for this in the sendto() and the master log file -- because the internal docker bridge IP is included in the POST and perhaps that is how the master is trying to talk back to the requesting framework?? Inside the container is this; tcp0 0 0.0.0.0:44431 0.0.0.0:* LISTEN 1/scheduler This is not my code! I'm at a loss where to go from here. Anyone got any further suggestions to fix this? You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact that you are on a virtual Docker interface. -- -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic
Re: Debugging framework registration from inside docker
I think I can conclude then that this just won't work; one cannot run a framework as a docker container using bridged networking. This is because a POST to the MM that libprocess does on your framework's behalf, includes the non-route-able private docker IP and that is what the MM well then try to communicate with? Setting LIBPROCESS_IP to the host IP will of course not work because then libprocess or somewhere in the mesos framework code an attempt at bind()ing to that interface is made and fails because it does not exist in bridge mode. *If* the above is correct then the question I suppose is why does the communication channel get established in that way? Why off the back of some data in a POST rather than the connected endpoint (that presumably docker would manage/forward as it would with a regular web service, for example)? Is this some caveat of using zookeeper? I'm sure someone will correct me where I'm wrong ;) Cheers, Jim On 11 June 2015 at 10:00, James Vanns jvanns@gmail.com wrote: Looks like I share the same symptoms as this 'marathon inside container' problem; https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion I guess that sheds some light on the subject ;) On 11 June 2015 at 09:43, James Vanns jvanns@gmail.com wrote: For what exactly? I thought that was for slave-master communication? There is no problem there. Or are you suggesting that from inside the running container I set at least LIBPROCESS_IP to the host IP rather than the IP of eth0 the container sees? Won't that screw with the docker bridge routing? This doesn't quite make sense. I have other network connections inside this container and those channels are established and communicating fine. It's just with the mesos master for some reason. Just to be clear; * The running process is a scheduling framework * It does not listen for any inbound connection requests * It, of course, does attempt an outbound connection to the zookeeper to get the MM (this works) * It then attempts to establish a connection with the MM (this also works) * When the MM sends a response, it fails - it effectively tries to send the response back to the private/internal docker IP where my scheduler is running. * This problem disappears when run with --net=host TCPDump never shows any inbound traffic; IP 172.17.1.197.55182 172.20.121.193.5050 ... Therefore there is never any ACK# that corresponds with the SEQ# and these are just re-transmissions. I think! Jim On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com wrote: On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote: Hi. When attempting to run my scheduler inside a docker container in --net=bridge mode it never receives acknowledgement or a reply to that request. However, it works fine in --net=host mode. It does not listen on any port as a service so does not expose any. The scheduler receives the mesos master (leader) from zookeeper fine but fails to register the framework with that master. It just loops trying to do so - the master sees the registration but deactivates it immediately as apparently it disconnects. It doesn't disconnect but is obviously unreachable. I see the reason for this in the sendto() and the master log file -- because the internal docker bridge IP is included in the POST and perhaps that is how the master is trying to talk back to the requesting framework?? Inside the container is this; tcp0 0 0.0.0.0:44431 0.0.0.0:* LISTEN 1/scheduler This is not my code! I'm at a loss where to go from here. Anyone got any further suggestions to fix this? You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact that you are on a virtual Docker interface. -- -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic
Re: Can Mesos master offer resources to multiple frameworks simultaneously?
Hi Qian Zhang I can answer the fourth question. if a framework has not responded to an offer for a sufficiently long time, Mesos rescinds the offer and re-offers the resources to other frameworks. You cant get it I am not clear in how Mesos divide all resources into multiple subsets? 陈宗志 Blog: baotiao.github.io On Jun 11, 2015, at 08:35, Qian Zhang zhq527...@gmail.com wrote: Thanks Alex. For 1. I understand currently the only choice is C++. However, as Adam mentioned, true pluggable allocator modules (MESOS-2160 https://issues.apache.org/jira/browse/MESOS-2160) are landing in Mesos 0.23, so at that time, I assume we will have more choices, right? For 2 and 3, my understanding is Mesos allocator will partition all the available resources into multiple subsets, and there is no overlap between these subsets (i.e., a single resource can only be in one subset), and then offer these subsets to multiple frameworks (e.g., offer subset1 to framework1, offer subset2 to framework2, and so on), and it is up to each framework's scheduler to determine if it accept the resource to launch task or reject it. In this way, each framework's scheduler can actually make scheduling decision independently since they will never compete for the same resource. If my understanding is correct, then I have one more question: 4. What if it takes very long time (e.g., mins or hours) for a framework's scheduler to make the scheduling decision? Does that mean during this long period, the resources offered to this framework will not be used by any other frameworks? Is there a timeout for the framework's scheduler to make the scheduling decision? So when the timeout is reached, the resources offered to it will be revoked by Mesos allocator and can be offered to another framework.
Re: Threading model of mesos API (C++)
Excellent. Thank you both for your time and efforts - and most importantly clarifying behavior :) Jim
Re: slave work_dir filling up
Thanks! This is exactly what I was looking for. On 9 June 2015 at 23:18, zhou weitao zhouwtl...@gmail.com wrote: BTW, we also configured spark spark.shuffle.consolidateFiles=true to optimize it. 2015-06-10 8:16 GMT+08:00 Jeff Schroeder jeffschroe...@computer.org: http://mesos.apache.org/documentation/latest/configuration/ http://mesos.apache.org/documentation/latest/configuration/#slave-options http://mesos.apache.org/documentation/latest/configuration/#slave-optionsLook under Slave Options at --gc_delay and --gc_disk_headroom On Tuesday, June 9, 2015, Gary Ogden gog...@gmail.com wrote: We're using spark 1.2 on mesos 0.21.1. We have a simple data collection job that runs every hour. But for some reason the executor files remain in the work_dir (/tmp/mesos in our case) and eventually fill up the disk. An example path of all these executors is: /tmp/mesos/slaves/20150408-134250-239470090-5050-15979-S1/frameworks After a while we end up with a folder for every hourly run and it fills up the disk. Shouldn't these be deleted when the job completes? In the mesos web UI, on the main page, I noticed that the completed tasks show FINISHED, but when I go to the frameworks page, pick one of the completed frameworks, the completed tasks there show a state of KILLED. That seems weird. Is the app coded wrong? Is there a setting in mesos I need to add to clean these up? -- Text by Jeff, typos by iPhone
Mesos-Kafka, any plans to support Consumers?
Hi, I just started to look into the New Kafka Consumer where it's easier to assign TopicPartitions to individual consumer instances. Ideally we would like the assignment of TopicPartitions to be smart. That is, be able to apply similar scheduling, operations, and constraints as is implemented by Mesos-Kafka today when it manages Kafka Brokers, and be able to assign partitions based on the state of the consumers and consumer groups in the cluster. Are there any plans to extend mesos-kafka to manage Consumers and not only Brokers? If so, have the goals of such an effort been discussed? (I understand that due to what's currently available in Kafka it's not easy to do. But it doesn't hurt to ask. :-) ) Thanks! Olof
Re: Debugging framework registration from inside docker
On Thu, Jun 11, 2015 at 4:00 AM, James Vanns jvanns@gmail.com wrote: I think I can conclude then that this just won't work; one cannot run a framework as a docker container using bridged networking. This is because a POST to the MM that libprocess does on your framework's behalf, includes the non-route-able private docker IP and that is what the MM well then try to communicate with? Setting LIBPROCESS_IP to the host IP will of course not work because then libprocess or somewhere in the mesos framework code an attempt at bind()ing to that interface is made and fails because it does not exist in bridge mode. You are right on track. This is a known issue: https://issues.apache.org/jira/browse/MESOS-809. Anindya has submitted a short term fix, which unfortunately never landed. I'll shepherd and commit this. *If* the above is correct then the question I suppose is why does the communication channel get established in that way? Why off the back of some data in a POST rather than the connected endpoint (that presumably docker would manage/forward as it would with a regular web service, for example)? Is this some caveat of using zookeeper? Longer term, the plan is for the master is to reuse the connection opened by the scheduler and not open a new one, as you mentioned. See https://issues.apache.org/jira/browse/MESOS-2289 I'm sure someone will correct me where I'm wrong ;) You are not!
Re: Mesos-Kafka, any plans to support Consumers?
Hello Olof, the new consumer on trunk has a feature for subscribing to a specific partition https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200 for use by launching on Mesos. In this case the rebalancing doesn't happen since every instance is on a partition (so if you have 100 partitions you would have 100 instances running (or less with some basic business logic (e.g. 10 instances each would own 10 partitions, etc). This is great because if you lose 1 instance EVERY instance doesn't have to rebalance and they just keep running. Awesome! Making mesos/kafka provide the ability for consumers to launch also is an interesting idea. Since each consumer is a custom application things like configuration management would be specific and different so there are a lot of nuances to consider. I am not saying it can't be done but we need to make sure the implementation is useful and generic enough for this. Perhaps we could provide the ability to supply the tgz for folks custom consumer so we can launch it (via the kafka scheduler) for you. I think though to-do this there would be a lot more than just scheduler changes we would likely have to right a new executor and some API so your consumer can get data from the executor or something. We can noodle on it. It would be great to hear more about what uses cases you (or anyone else) have in mind so we can see how it might work based on other implementations we work, see and know about. For what we do now on Mesos (for producers and consumers) is either run them as custom frameworks (because they do special things (e.g. Storm, Spark)) or via Marathon. ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Thu, Jun 11, 2015 at 12:54 PM, Johansson, Olof olof.johans...@thingworx.com wrote: Hi, I just started to look into the New Kafka Consumer where it's easier to assign TopicPartitions to individual consumer instances. Ideally we would like the assignment of TopicPartitions to be smart. That is, be able to apply similar scheduling, operations, and constraints as is implemented by Mesos-Kafka today when it manages Kafka Brokers, and be able to assign partitions based on the state of the consumers and consumer groups in the cluster. Are there any plans to extend mesos-kafka to manage Consumers and not only Brokers? If so, have the goals of such an effort been discussed? (I understand that due to what's currently available in Kafka it's not easy to do. But it doesn't hurt to ask. :-) ) Thanks! Olof
Re: Debugging framework registration from inside docker
I believe you're correct Jim, if you set LIBPROCESS_IP=$HOST_IP libprocess will try to bind to that address as well as announce it, which won't work inside a bridged container. We've been having a similar discussion on https://github.com/wickman/pesos/issues/25. -- Tom Arnfeld Developer // DueDil On Thursday, Jun 11, 2015 at 10:00 am, James Vanns jvanns@gmail.com, wrote: Looks like I share the same symptoms as this 'marathon inside container' problem; https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion I guess that sheds some light on the subject ;) On 11 June 2015 at 09:43, James Vanns jvanns@gmail.com wrote: For what exactly? I thought that was for slave-master communication? There is no problem there. Or are you suggesting that from inside the running container I set at least LIBPROCESS_IP to the host IP rather than the IP of eth0 the container sees? Won't that screw with the docker bridge routing? This doesn't quite make sense. I have other network connections inside this container and those channels are established and communicating fine. It's just with the mesos master for some reason. Just to be clear; * The running process is a scheduling framework * It does not listen for any inbound connection requests * It, of course, does attempt an outbound connection to the zookeeper to get the MM (this works) * It then attempts to establish a connection with the MM (this also works) * When the MM sends a response, it fails - it effectively tries to send the response back to the private/internal docker IP where my scheduler is running. * This problem disappears when run with --net=host TCPDump never shows any inbound traffic; IP 172.17.1.197.55182 172.20.121.193.5050 ... Therefore there is never any ACK# that corresponds with the SEQ# and these are just re-transmissions. I think! Jim On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com wrote: On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote: Hi. When attempting to run my scheduler inside a docker container in --net=bridge mode it never receives acknowledgement or a reply to that request. However, it works fine in --net=host mode. It does not listen on any port as a service so does not expose any. The scheduler receives the mesos master (leader) from zookeeper fine but fails to register the framework with that master. It just loops trying to do so - the master sees the registration but deactivates it immediately as apparently it disconnects. It doesn't disconnect but is obviously unreachable. I see the reason for this in the sendto() and the master log file -- because the internal docker bridge IP is included in the POST and perhaps that is how the master is trying to talk back to the requesting framework?? Inside the container is this; tcp 0 0 0.0.0.0:44431 0.0.0.0:* LISTEN 1/scheduler This is not my code! I'm at a loss where to go from here. Anyone got any further suggestions to fix this? You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact that you are on a virtual Docker interface. -- -- Senior Code Pig Industrial Light Magic -- -- Senior Code Pig Industrial Light Magic
Re: Mesos-Kafka, any plans to support Consumers?
Thanks Joe! Yeah, the new Kafka Consumer really hits the sweet spot with the option to perform manual management of which partitions a consumer should subscribe to without having to pay the rebalancing cost when the consumer processes scale horizontally. And all this without the head-aches of implementing a Simple consumer. Awesome indeed! A summary of the use-case I have in mind is pretty typical for any internet event ingestion scenario with low latency, high throughput requirements: * Kafka used as a durable shock-absorber. (Brokers will scale elastically) * Consumers should be able to scale horizontally based on a somewhat un-predictable load. * Consumers process messages from Kafka, raises new events, and persist a (large) subset of processed messages to Cassandra. * Consumers have in-memory cache of meta-data related to the aggregates the messages are related to. Populating the in-memory cache is fairly expensive. Semantic partitioning is done on the producer side so one consumer always gets messages for the same aggregate (given no rebalancing). * On scale out/in, the consumer partition assignment strategy should move as few partitions as possible to limit the cost of re-populating the in-memory cache for those partitions. (Only range and roundrobin assignment strategies are possible in Kafka today, but looks like they plan to add the possibility to add user defined strategies.) * The non-affected consumers should still continue reading messages to keep the overall latency as low as possible. By letting Kafka handle the consumer rebalancing itself can not solve this use-case as it requires knowledge of: * Consumer location relative to other systems it depends on for read and writes. That is, the consumer partition assignment should be location aware to reduce latency and increase throughput. (E.g. same rack as the related Cassandra shard, Postgres shard, etc.) * That all partitions are served by one consumer at any given time, without depending on the Kafka consumer rebalancing protocol as the partition ownership will now be managed outside. In short, a Kafka ConsumerCoordinator running inside a cloud resource management system, such as Mesos. From: Joe Stein joe.st...@stealth.lymailto:joe.st...@stealth.ly Reply-To: user@mesos.apache.orgmailto:user@mesos.apache.org user@mesos.apache.orgmailto:user@mesos.apache.org Date: Thursday, 11 June 2015 15:26 To: user@mesos.apache.orgmailto:user@mesos.apache.org user@mesos.apache.orgmailto:user@mesos.apache.org Subject: Re: Mesos-Kafka, any plans to support Consumers? Hello Olof, the new consumer on trunk has a feature for subscribing to a specific partition https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L200 for use by launching on Mesos. In this case the rebalancing doesn't happen since every instance is on a partition (so if you have 100 partitions you would have 100 instances running (or less with some basic business logic (e.g. 10 instances each would own 10 partitions, etc). This is great because if you lose 1 instance EVERY instance doesn't have to rebalance and they just keep running. Awesome! Making mesos/kafka provide the ability for consumers to launch also is an interesting idea. Since each consumer is a custom application things like configuration management would be specific and different so there are a lot of nuances to consider. I am not saying it can't be done but we need to make sure the implementation is useful and generic enough for this. Perhaps we could provide the ability to supply the tgz for folks custom consumer so we can launch it (via the kafka scheduler) for you. I think though to-do this there would be a lot more than just scheduler changes we would likely have to right a new executor and some API so your consumer can get data from the executor or something. We can noodle on it. It would be great to hear more about what uses cases you (or anyone else) have in mind so we can see how it might work based on other implementations we work, see and know about. For what we do now on Mesos (for producers and consumers) is either run them as custom frameworks (because they do special things (e.g. Storm, Spark)) or via Marathon. ~ Joe Stein - - - - - - - - - - - - - - - - - [https://docs.google.com/uc?export=downloadid=0B3rS2kftp470b19EQXp0Q2JheVErevid=0B3rS2kftp470aFhGdzZqMnUwT3M0MTlsZU8zZjZobGFuNFdrPQ] http://www.stealth.ly - - - - - - - - - - - - - - - - - On Thu, Jun 11, 2015 at 12:54 PM, Johansson, Olof olof.johans...@thingworx.commailto:olof.johans...@thingworx.com wrote: Hi, I just started to look into the New Kafka Consumer where it's easier to assign TopicPartitions to individual consumer instances. Ideally we would like the assignment of TopicPartitions to be smart. That is, be able to apply similar scheduling, operations, and constraints as is implemented by
Re: Can Mesos master offer resources to multiple frameworks simultaneously?
4. By default, Mesos will not revoke (rescind) an *un*used offer being held by a framework, but you can enable such a timeout by specifying the `--offer_timeout` flag on the master. On Thu, Jun 11, 2015 at 4:48 PM, Adam Bordelon a...@mesosphere.io wrote: 1. The modularized allocator will still be a C++ interface, but you could just create a C++ wrapper around whatever Python/Go/Java/etc. implementation that you prefer. Your assessment of 23 sounds correct. 4. By default, Mesos will not revoke (rescind) an used offer being held by a framework, but you can enable such a timeout by specifying the `--offer_timeout` flag on the master. On Thu, Jun 11, 2015 at 1:41 AM, baotiao baot...@gmail.com wrote: Hi Qian Zhang I can answer the fourth question. if a framework has not responded to an offer for a sufficiently long time, Mesos rescinds the offer and re-offers the resources to other frameworks. You cant get it I am not clear in how Mesos divide all resources into multiple subsets? 陈宗志 Blog: baotiao.github.io On Jun 11, 2015, at 08:35, Qian Zhang zhq527...@gmail.com wrote: Thanks Alex. For 1. I understand currently the only choice is C++. However, as Adam mentioned, true pluggable allocator modules (MESOS-2160 https://issues.apache.org/jira/browse/MESOS-2160) are landing in Mesos 0.23, so at that time, I assume we will have more choices, right? For 2 and 3, my understanding is Mesos allocator will partition all the available resources into multiple subsets, and there is no overlap between these subsets (i.e., a single resource can only be in one subset), and then offer these subsets to multiple frameworks (e.g., offer subset1 to framework1, offer subset2 to framework2, and so on), and it is up to each framework's scheduler to determine if it accept the resource to launch task or reject it. In this way, each framework's scheduler can actually make scheduling decision independently since they will never compete for the same resource. If my understanding is correct, then I have one more question: 4. What if it takes very long time (e.g., mins or hours) for a framework's scheduler to make the scheduling decision? Does that mean during this long period, the resources offered to this framework will not be used by any other frameworks? Is there a timeout for the framework's scheduler to make the scheduling decision? So when the timeout is reached, the resources offered to it will be revoked by Mesos allocator and can be offered to another framework.
Re: Can Mesos master offer resources to multiple frameworks simultaneously?
Thanks Adam! It is clear to me now :-) 2015-06-12 7:49 GMT+08:00 Adam Bordelon a...@mesosphere.io: 4. By default, Mesos will not revoke (rescind) an *un*used offer being held by a framework, but you can enable such a timeout by specifying the `--offer_timeout` flag on the master. On Thu, Jun 11, 2015 at 4:48 PM, Adam Bordelon a...@mesosphere.io wrote: 1. The modularized allocator will still be a C++ interface, but you could just create a C++ wrapper around whatever Python/Go/Java/etc. implementation that you prefer. Your assessment of 23 sounds correct. 4. By default, Mesos will not revoke (rescind) an used offer being held by a framework, but you can enable such a timeout by specifying the `--offer_timeout` flag on the master. On Thu, Jun 11, 2015 at 1:41 AM, baotiao baot...@gmail.com wrote: Hi Qian Zhang I can answer the fourth question. if a framework has not responded to an offer for a sufficiently long time, Mesos rescinds the offer and re-offers the resources to other frameworks. You cant get it I am not clear in how Mesos divide all resources into multiple subsets? 陈宗志 Blog: baotiao.github.io On Jun 11, 2015, at 08:35, Qian Zhang zhq527...@gmail.com wrote: Thanks Alex. For 1. I understand currently the only choice is C++. However, as Adam mentioned, true pluggable allocator modules (MESOS-2160 https://issues.apache.org/jira/browse/MESOS-2160) are landing in Mesos 0.23, so at that time, I assume we will have more choices, right? For 2 and 3, my understanding is Mesos allocator will partition all the available resources into multiple subsets, and there is no overlap between these subsets (i.e., a single resource can only be in one subset), and then offer these subsets to multiple frameworks (e.g., offer subset1 to framework1, offer subset2 to framework2, and so on), and it is up to each framework's scheduler to determine if it accept the resource to launch task or reject it. In this way, each framework's scheduler can actually make scheduling decision independently since they will never compete for the same resource. If my understanding is correct, then I have one more question: 4. What if it takes very long time (e.g., mins or hours) for a framework's scheduler to make the scheduling decision? Does that mean during this long period, the resources offered to this framework will not be used by any other frameworks? Is there a timeout for the framework's scheduler to make the scheduling decision? So when the timeout is reached, the resources offered to it will be revoked by Mesos allocator and can be offered to another framework.