RE: Can I consider other framework tasks as a resource? Does it make sense?
It is very helpful. I will take a deeper look on Fenzo. Isn’t pretty much everything external knowledge to a scheduler? CPU, mem, net, storage… these all information has to somehow get into scheduler. But for these there is an internal support by Mesos via resource offers and it is what I think you mean by internal vs. external. What I’m thinking is that there is already a mechanism in Mesos how to get information into scheduler but it is not extendable by custom resource types. Thinking about offered resources I have also realized that there is a common trait to them – they are consumable. When one task accepts some resources they are not available to other tasks. Hence probably if I would like to represent other constraints as resources they would have to have this property. Then, in theory, they could by plugged into Mesos resources mechanism. Possibly not all constraints can be modelled as consumables and the approach through pluggable scheduling library like Fenzo might be more flexible. My original question was basically about what counts as scheduling so that when I need to model some constraint how to place a task I would know where it belongs in my framework’s code. It seems to be answered. Thanks a lot. From: Sharma Podila [mailto:spod...@netflix.com] Sent: 15. prosince 2016 1:59 To: user@mesos.apache.org Subject: Re: Can I consider other framework tasks as a resource? Does it make sense? In general, placing a task based on certain constraints (e.g., locality with other tasks) is a scheduling concern. The complexity in your scenario is that the constraint specification requires knowledge external to your scheduler. If you are able to route that external information (on what and where other frameworks' tasks are running) into your scheduler, then, you should be able to achieve the locality constraints in your scheduler. If your scheduler happens to be running on the JVM, our open source Fenzo scheduling library can be useful. Or at least provide one idea on how your could write a scheduler that deals with such constraints. In Fenzo, for example, you'd write a custom plugin to handle the locality by using the external information, that I refer to above, to "score" agents that fit your task better. Fenzo will then pick the best agent to launch your task for locality. One limitation is the fact that you'd have little to no control on ensuring that the agents on which those other frameworks' tasks are running on will have additional resources available to fit your tasks. And that offers from those agents will arrive at your scheduler. Some variation of "delay scheduling" can help the latter by rejecting offers from agents that do not contain the tasks of interest from other frameworks. On Wed, Dec 14, 2016 at 10:33 AM, Petr Novakwrote: Thanks a lot for the input. “Y scheduler can accept a rule how to check readiness on startup” Based on it seems like +1 that I can consider it as a responsibility of a scheduler. Cheers, Petr From: Alex Rukletsov [mailto:a...@mesosphere.com] Sent: 14. prosince 2016 13:01 To: user Subject: Re: Can I consider other framework tasks as a resource? Does it make sense? Task dependency is probably too vague to discuss specifically. Mesos currently does not explicitly support arbitrary task dependencies. You mentioned colocation, one type of dependency, so let's look at it. If I understood you correctly, you would like to colocate a task from framework B to the same node where a task from framework A is running. The first problem is to get a list of such nodes (and keep them updated, because task may crash, migrate and so on). This can be done, say, by using Mesos DNS or alike. The second problem is to ensure that framework gets enough resources from that nodes. A possible solution here is to put both frameworks A and B into the same role and use dynamic reservations to ensure enough resources are laid away for both tasks. Disadvantages: you should know about all dependencies upfront, frameworks should be in the same role. Now the question is, why would you need to colocate workloads? I would say this is something you should avoid if possible, like any extra constraint that complicate the system. Probably the only 100% legitimate use case for colocation is data locality. Solving this particular problem seems easier than to address arbitrary task dependencies. If all you try to achieve is making sure a specific service represented by a framework X is running and ready in the cluster, you can do that by running specific checks before starting a depending framework Y or launching a new task in this framework. If your question is about whether Y should know about X and know how to check readiness of X in the cluster, I'd say you'd better keep that abstracted: Y scheduler can accept a rule how to check readiness on startup. On Wed,
Re: mesos cpuset isolator module available
thanks for your sharing. v5 2016-12-15 23:40 GMT+08:00 ct clmsn: > I've completed a mesos module to support cgroups cpusets. This work is > related to a JIRA ticket that I posted last spring (MESOS-5342). Apologies > for the long delay wrapping up the implementation. > > https://github.com/ct-clmsn/mesos-cpusets > > If you test it out, have issues, or want to make improvements, please post > to github - I've done some very simple/trivial testing. > > Chris > -- Deshi Xiao Twitter: xds2000 E-mail: xiaods(AT)gmail.com
mesos cpuset isolator module available
I've completed a mesos module to support cgroups cpusets. This work is related to a JIRA ticket that I posted last spring (MESOS-5342). Apologies for the long delay wrapping up the implementation. https://github.com/ct-clmsn/mesos-cpusets If you test it out, have issues, or want to make improvements, please post to github - I've done some very simple/trivial testing. Chris
Re: mesos cpuset isolator module available
I'll add in BUILD instructions tonight/this weekend. I'll be releasing some performance counter tools to use in a mesos system (for container applications) very soon. On Thu, Dec 15, 2016 at 12:13 PM, tommy xiaowrote: > thanks for your sharing. v5 > > 2016-12-15 23:40 GMT+08:00 ct clmsn : > >> I've completed a mesos module to support cgroups cpusets. This work is >> related to a JIRA ticket that I posted last spring (MESOS-5342). Apologies >> for the long delay wrapping up the implementation. >> >> https://github.com/ct-clmsn/mesos-cpusets >> >> If you test it out, have issues, or want to make improvements, please >> post to github - I've done some very simple/trivial testing. >> >> Chris >> > > > > -- > Deshi Xiao > Twitter: xds2000 > E-mail: xiaods(AT)gmail.com >
Re: mesos cpuset isolator module available
Thanks for sharing. This is very interesting to us because we are also looking for solution for latency sensitive CPU isolation. On Thu, Dec 15, 2016 at 9:56 AM, ct clmsnwrote: > I'll add in BUILD instructions tonight/this weekend. I'll be releasing > some performance counter tools to use in a mesos system (for container > applications) very soon. > > On Thu, Dec 15, 2016 at 12:13 PM, tommy xiao wrote: > >> thanks for your sharing. v5 >> >> 2016-12-15 23:40 GMT+08:00 ct clmsn : >> >>> I've completed a mesos module to support cgroups cpusets. This work is >>> related to a JIRA ticket that I posted last spring (MESOS-5342). Apologies >>> for the long delay wrapping up the implementation. >>> >>> https://github.com/ct-clmsn/mesos-cpusets >>> >>> If you test it out, have issues, or want to make improvements, please >>> post to github - I've done some very simple/trivial testing. >>> >>> Chris >>> >> >> >> >> -- >> Deshi Xiao >> Twitter: xds2000 >> E-mail: xiaods(AT)gmail.com >> > > -- Cheers, Zhitao Li
Re: mesos cpuset isolator module available
Super cool! does this also support limiting which memory controller the tasks can use? On Thu, Dec 15, 2016 at 10:00 AM Zhitao Liwrote: > Thanks for sharing. This is very interesting to us because we are also > looking for solution for latency sensitive CPU isolation. > > On Thu, Dec 15, 2016 at 9:56 AM, ct clmsn wrote: > > I'll add in BUILD instructions tonight/this weekend. I'll be releasing > some performance counter tools to use in a mesos system (for container > applications) very soon. > > On Thu, Dec 15, 2016 at 12:13 PM, tommy xiao wrote: > > thanks for your sharing. v5 > > 2016-12-15 23:40 GMT+08:00 ct clmsn : > > I've completed a mesos module to support cgroups cpusets. This work is > related to a JIRA ticket that I posted last spring (MESOS-5342). Apologies > for the long delay wrapping up the implementation. > > https://github.com/ct-clmsn/mesos-cpusets > > If you test it out, have issues, or want to make improvements, please post > to github - I've done some very simple/trivial testing. > > Chris > > > > > -- > Deshi Xiao > Twitter: xds2000 > E-mail: xiaods(AT)gmail.com > > > > > > -- > Cheers, > > Zhitao Li >
Re: Mesos 1.1 web ui issues
Hi, @haripriya What's the hostname flag that you use to start master? According to the screenshot you posted before, I think you need to set it to something like `socrates-nid000xxx.us.cray.com`. However, the error log you post above, you set the hostname flag to nid00016 which could not be resolved. On Fri, Dec 16, 2016 at 6:51 AM, Haripriya Ayyalasomayajula < aharipriy...@gmail.com> wrote: > Hello @Haosdent, > > After I tried to use hostname, I still see the error. This is the output I > see in developer tools for chrome: > > Failed to load resource: the server responded with a status of 404 (Not > Found) > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._2 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._3 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._4 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._5 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._6 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._7 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._8 Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._9 Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._a Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._b Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._c Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._d Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._e Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._f Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/master/state?jsonp=angular.callbacks._g Failed to > load resource: net::ERR_NAME_NOT_RESOLVED > http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._h Failed > to load resource: net::ERR_NAME_NOT_RESOLVED > angular-1.2.3.min.js:70 GET http://nid00016:5050/master/ > state?jsonp=angular.callbacks._i net::ERR_NAME_NOT_RESOLVEDg @ > angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ > angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ > angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ > angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ > angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous > function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous > function) @ angular-1.2.3.min.js:37 > angular-1.2.3.min.js:70 GET http://nid00016:5050/metrics/ > snapshot?jsonp=angular.callbacks._j net::ERR_NAME_NOT_RESOLVEDg @ > angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ > angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ > angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ > angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ > angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous > function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous > function) @ angular-1.2.3.min.js:37 > > > Also, regarding the "cluster flag", here is my output: > > nid00016: root 14940 2.5 0.0 2080192 85012 ? Ssl 16:44 0:08 > /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, > 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos > --acls=/etc/mesos_acls.json --authenticate_frameworks=true > --cluster="socrates" --credentials=/etc/marathon-auth/credentials > --hostname=nid00016 --quorum=2 --work_dir=/var/lib/mesos > > nid00016: root 14965 0.0 0.0 107892 612 ?S16:44 0:00 > logger -p user.info -t mesos-master[14940] > > nid00016: root 14966 0.0 0.0 107892 692 ?S16:44 0:00 > logger -p user.err -t mesos-master[14940] > > nid00016: root 15892 0.0 0.0 113116 1604 ?Ss 16:50 0:00 > bash -c ps -aux | grep mesos-master > > nid00016: root 15959 0.0 0.0 112644 948 ?S16:50 0:00 > grep mesos-master > > nid00032: root 30018 2.5 0.0 2670032 26480 ? Ssl 16:44 0:08 > /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, > 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos > --acls=/etc/mesos_acls.json --authenticate_frameworks=true > --cluster="socrates" --credentials=/etc/marathon-auth/credentials > --hostname=nid00032 --quorum=2
Re: Mesos 1.1 web ui issues
Hello @Haosdent, After I tried to use hostname, I still see the error. This is the output I see in developer tools for chrome: Failed to load resource: the server responded with a status of 404 (Not Found) http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._2 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._3 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._4 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._5 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._6 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._7 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._8 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._9 Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._a Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._b Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._c Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._d Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._e Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._f Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/master/state?jsonp=angular.callbacks._g Failed to load resource: net::ERR_NAME_NOT_RESOLVED http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._h Failed to load resource: net::ERR_NAME_NOT_RESOLVED angular-1.2.3.min.js:70 GET http://nid00016:5050/master/state?jsonp=angular.callbacks._i net::ERR_NAME_NOT_RESOLVEDg @ angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous function) @ angular-1.2.3.min.js:37 angular-1.2.3.min.js:70 GET http://nid00016:5050/metrics/snapshot?jsonp=angular.callbacks._j net::ERR_NAME_NOT_RESOLVEDg @ angular-1.2.3.min.js:70(anonymous function) @ angular-1.2.3.min.js:71D @ angular-1.2.3.min.js:68h @ angular-1.2.3.min.js:66D @ angular-1.2.3.min.js:91D @ angular-1.2.3.min.js:91(anonymous function) @ angular-1.2.3.min.js:93$eval @ angular-1.2.3.min.js:101$digest @ angular-1.2.3.min.js:98$apply @ angular-1.2.3.min.js:101(anonymous function) @ angular-1.2.3.min.js:111e @ angular-1.2.3.min.js:33(anonymous function) @ angular-1.2.3.min.js:37 Also, regarding the "cluster flag", here is my output: nid00016: root 14940 2.5 0.0 2080192 85012 ? Ssl 16:44 0:08 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos --acls=/etc/mesos_acls.json --authenticate_frameworks=true --cluster="socrates" --credentials=/etc/marathon-auth/credentials --hostname=nid00016 --quorum=2 --work_dir=/var/lib/mesos nid00016: root 14965 0.0 0.0 107892 612 ?S16:44 0:00 logger -p user.info -t mesos-master[14940] nid00016: root 14966 0.0 0.0 107892 692 ?S16:44 0:00 logger -p user.err -t mesos-master[14940] nid00016: root 15892 0.0 0.0 113116 1604 ?Ss 16:50 0:00 bash -c ps -aux | grep mesos-master nid00016: root 15959 0.0 0.0 112644 948 ?S16:50 0:00 grep mesos-master nid00032: root 30018 2.5 0.0 2670032 26480 ? Ssl 16:44 0:08 /usr/sbin/mesos-master --zk=zk://192.168.0.1:2181,192.168.0.17:2181, 192.168.0.33:2181/mesos --port=5050 --log_dir=/var/log/mesos --acls=/etc/mesos_acls.json --authenticate_frameworks=true --cluster="socrates" --credentials=/etc/marathon-auth/credentials --hostname=nid00032 --quorum=2 --work_dir=/var/lib/mesos nid00032: root 30043 0.0 0.0 107892 612 ?S16:44 0:00 logger -p user.info -t mesos-master[30018] nid00032: root 30044 0.0 0.0 107892 692 ?S16:44 0:00 logger -p user.err -t mesos-master[30018] nid00032: root 31091 0.0 0.0 113116 1604 ?Ss 16:50 0:00 bash -c ps -aux | grep mesos-master nid00032: root 31158 0.0 0.0 112644 948 ?S16:50 0:00 grep mesos-master nid0: root 49753 3.7 0.0 3259912 27584 ? Ssl 16:44 0:13 /usr/sbin/mesos-master
Re: Can I consider other framework tasks as a resource? Does it make sense?
Response below: On Thu, Dec 15, 2016 at 5:22 AM, Petr Novakwrote: > It is very helpful. I will take a deeper look on Fenzo. > Isn’t pretty much everything external knowledge to a scheduler? CPU, mem, > net, storage… these all information has to somehow get into scheduler. But > for these there is an internal support by Mesos via resource offers and it > is what I think you mean by internal vs. external. > Yes, that's right. > > > What I’m thinking is that there is already a mechanism in Mesos how to get > information into scheduler but it is not extendable by custom resource > types. Thinking about offered resources I have also realized that there is > a common trait to them – they are consumable. When one task accepts some > resources they are not available to other tasks. Hence probably if I would > like to represent other constraints as resources they would have to have > this property. Then, in theory, they could by plugged into Mesos resources > mechanism. Possibly not all constraints can be modelled as consumables and > the approach through pluggable scheduling library like Fenzo might be more > flexible. > Constraints can be on non-consumables. For example, we have constraints on custom attributes, not just resources. The trick is to get the information on other tasks on the agent back into the scheduler. Today we do this only among tasks of the same framework, so, Fenzo know about all of them. If there is a way to get the dynamic task scheduling info for the other frameworks, you could, for example, add those into Fenzo and let it maintain state and do the constraints. > > > My original question was basically about what counts as scheduling so that > when I need to model some constraint how to place a task I would know where > it belongs in my framework’s code. It seems to be answered. Thanks a lot. > > > > *From:* Sharma Podila [mailto:spod...@netflix.com] > *Sent:* 15. prosince 2016 1:59 > *To:* user@mesos.apache.org > > *Subject:* Re: Can I consider other framework tasks as a resource? Does > it make sense? > > > > In general, placing a task based on certain constraints (e.g., locality > with other tasks) is a scheduling concern. The complexity in your scenario > is that the constraint specification requires knowledge external to your > scheduler. If you are able to route that external information (on what and > where other frameworks' tasks are running) into your scheduler, then, you > should be able to achieve the locality constraints in your scheduler. > > > > If your scheduler happens to be running on the JVM, our open source Fenzo > scheduling library can be useful. Or at least provide one idea on how your > could write a scheduler that deals with such constraints. In Fenzo, for > example, you'd write a custom plugin to handle the locality by using the > external information, that I refer to above, to "score" agents that fit > your task better. Fenzo will then pick the best agent to launch your task > for locality. > > > > One limitation is the fact that you'd have little to no control on > ensuring that the agents on which those other frameworks' tasks are running > on will have additional resources available to fit your tasks. And that > offers from those agents will arrive at your scheduler. Some variation of > "delay scheduling" can help the latter by rejecting offers from agents that > do not contain the tasks of interest from other frameworks. > > > > > > On Wed, Dec 14, 2016 at 10:33 AM, Petr Novak wrote: > > Thanks a lot for the input. > > > > “Y scheduler can accept a rule how to check readiness on startup” > > > > Based on it seems like +1 that I can consider it as a responsibility of a > scheduler. > > > > Cheers, > > Petr > > > > > > *From:* Alex Rukletsov [mailto:a...@mesosphere.com] > *Sent:* 14. prosince 2016 13:01 > *To:* user > *Subject:* Re: Can I consider other framework tasks as a resource? Does > it make sense? > > > > Task dependency is probably too vague to discuss specifically. Mesos > currently does not explicitly support arbitrary task dependencies. You > mentioned colocation, one type of dependency, so let's look at it. > > > > If I understood you correctly, you would like to colocate a task from > framework B to the same node where a task from framework A is running. The > first problem is to get a list of such nodes (and keep them updated, > because task may crash, migrate and so on). This can be done, say, by using > Mesos DNS or alike. The second problem is to ensure that framework gets > enough resources from that nodes. A possible solution here is to put both > frameworks A and B into the same role and use dynamic reservations to > ensure enough resources are laid away for both tasks. Disadvantages: you > should know about all dependencies upfront, frameworks should be in the > same role. > > > > Now the question is, why would you need to colocate workloads? I would say > this is something you