Re: forcing framework to re-schedule?
Hi, @Victor taskId is specified in `TaskInfo` when you launchTask. On Wed, Sep 14, 2016 at 6:22 AM, Victor L wrote: > how can i get taskId to call "killTask"? > > On Tue, Sep 13, 2016 at 9:59 AM, haosdent wrote: > >> If you want to kill the task from the scheduler, you just need to call >> `killTask`(https://github.com/apache/mesos/blob/1.0.x/includ >> e/mesos/scheduler.hpp#L257). >> If you want to kill the task by health check, you could try to set the >> correct `consecutive_failures` number (https://github.com/apache/mes >> os/blob/1.0.x/include/mesos/v1/mesos.proto#L358). >> >> On Tue, Sep 13, 2016 at 2:53 AM, Victor L wrote: >> >>> How can i explicitly kill the task from my class? >>> >>> On Mon, Sep 12, 2016 at 2:10 PM, haosdent wrote: >>> If the target you perform health check is your task, Mesos support health check by a command. When your task reaches the health task failure limit, the task would be killed and then your framework could launch the task again when receives the `TASK_KILLED` in `statusUpdate`. On Tue, Sep 13, 2016 at 2:03 AM, Victor L wrote: > It checks if process is functional. I don't think standard > healthchecks wouldn't be sufficient for my purpose and my question still > stands: how to use result... > > On Mon, Sep 12, 2016 at 1:48 PM, haosdent wrote: > >> Hi, @victor What's your health check agent used for? Because Mesos >> supports health checks now. >> >> On Tue, Sep 13, 2016 at 1:46 AM, Victor L >> wrote: >> >>> Hello, >>> I am writing "healthcheck agent" for mesos deployment framework as >>> independent thread periodically checking if main process ( started by >>> framework) is running... >>> What would be the mechanism to "communicate" failure to the >>> framework to cause specific outcome? For example: how can i use >>> failure to >>> cause framework to reschedule deployment on different node? >>> Thanks, >>> >> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang
Unable to run latest windows mesas-agent
Hi, I built a new windows mesas-agent.exe a few minutes ago and I am getting the following error. I don’t think it is a port issue. I0913 18:07:22.359485 12992 slave.cpp:1692] Launching task 'windows-hello-world.83d04669-7a17-11e6-ae4e-0021f6964572' for framework 03b7ca4b-1a97-4044-ba65-9a92458987ad-0026 W0913 18:07:22.359485 12992 slave.cpp:1760] Ignoring running task 'windows-hello-world.83d04669-7a17-11e6-ae4e-0021f6964572' of framework 03b7ca4b-1a97-4044-ba65-9a92458987ad-0026 because the framework is terminating I0913 18:07:22.359485 12992 slave.cpp:4660] Cleaning up framework 03b7ca4b-1a97-4044-ba65-9a92458987ad-0026 I0913 18:07:22.359485 3600 status_update_manager.cpp:285] Closing status update streams for framework 03b7ca4b-1a97-4044-ba65-9a92458987ad-0026 E0913 18:07:22.359485 12992 slave.cpp:5350] Failed to find the mtime of 'C:\mesos\slaves\1eb74be0-b996-4da5-a3a7-ee8ea7728dc5-S20\frameworks\03b7ca4b-1a97-4044-ba65-9a92458987ad-0026': Error invoking stat for 'C:\mesos\slaves\1eb74be0-b996-4da5-a3a7-ee8ea7728dc5-S20\frameworks\03b7ca4b-1a97-4044-ba65-9a92458987ad-0026': No such file or directory I0913 18:07:22.359485 8988 gc.cpp:55] Scheduling 'C:\mesos\meta\slaves\1eb74be0-b996-4da5-a3a7-ee8ea7728dc5-S20\frameworks\03b7ca4b-1a97-4044-ba65-9a92458987ad-0026' for gc 6.9583928296days in the future I0913 18:07:22.359485 12992 slave.cpp:783] Agent terminating ABORT: (C:\jenkins\workspace\mesos-windows-build\3rdparty\stout\include\stout/os/windows/socket.hpp:136): Not expecting 'getsockopt' to fail when passed a valid socket
Re: Unified cgroups isolator
Thanks @haosdent's awesome work and @Jie's great shepherding and guidance on this project! Thanks, Qian Zhang On Wed, Sep 14, 2016 at 7:56 AM, Gilbert Song wrote: > Awesome! > > Kudos to @haosdent and @qianzhang! > > On Tue, Sep 13, 2016 at 11:22 AM, haosdent wrote: > >> Really appreciate @qian and @jie's great helps on this! It makes us >> easier to add cgroups isolation for rest subsystem. >> >> Additionally, if you find any changes about unified cgroups isolator >> break your environment, please let us know. I would >> try to fix asap. >> >> On Wed, Sep 14, 2016 at 1:59 AM, Jie Yu wrote: >> >>> Hi, >>> >>> We just merged the unified cgroups isolator. Huge shout out to @haosdent >>> and @qianzhang to make this happen! >>> https://issues.apache.org/jira/browse/MESOS-4697 >>> >>> Just to give you some context. Previously, it's a huge pain to add a new >>> cgroups subsystem to Mesos because it requires creating a new isolator (a >>> lot of code duplication). Now, we merge all the subsystems into one single >>> isolator, that makes adding a new subsystem very easy. >>> >>> More importantly, the new cgroups isolator supports cgroups v2! >>> >>> - Jie >>> >> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > >
Re: Unified cgroups isolator
Awesome! Kudos to @haosdent and @qianzhang! On Tue, Sep 13, 2016 at 11:22 AM, haosdent wrote: > Really appreciate @qian and @jie's great helps on this! It makes us easier > to add cgroups isolation for rest subsystem. > > Additionally, if you find any changes about unified cgroups isolator break > your environment, please let us know. I would > try to fix asap. > > On Wed, Sep 14, 2016 at 1:59 AM, Jie Yu wrote: > >> Hi, >> >> We just merged the unified cgroups isolator. Huge shout out to @haosdent >> and @qianzhang to make this happen! >> https://issues.apache.org/jira/browse/MESOS-4697 >> >> Just to give you some context. Previously, it's a huge pain to add a new >> cgroups subsystem to Mesos because it requires creating a new isolator (a >> lot of code duplication). Now, we merge all the subsystems into one single >> isolator, that makes adding a new subsystem very easy. >> >> More importantly, the new cgroups isolator supports cgroups v2! >> >> - Jie >> > > > > -- > Best Regards, > Haosdent Huang >
Re: forcing framework to re-schedule?
how can i get taskId to call "killTask"? On Tue, Sep 13, 2016 at 9:59 AM, haosdent wrote: > If you want to kill the task from the scheduler, you just need to call > `killTask`(https://github.com/apache/mesos/blob/1.0.x/ > include/mesos/scheduler.hpp#L257). > If you want to kill the task by health check, you could try to set the > correct `consecutive_failures` number (https://github.com/apache/ > mesos/blob/1.0.x/include/mesos/v1/mesos.proto#L358). > > On Tue, Sep 13, 2016 at 2:53 AM, Victor L wrote: > >> How can i explicitly kill the task from my class? >> >> On Mon, Sep 12, 2016 at 2:10 PM, haosdent wrote: >> >>> If the target you perform health check is your task, Mesos support >>> health check by a command. When your task reaches the health task failure >>> limit, the task would be killed and then your framework could launch the >>> task again when receives the `TASK_KILLED` in `statusUpdate`. >>> >>> On Tue, Sep 13, 2016 at 2:03 AM, Victor L wrote: >>> It checks if process is functional. I don't think standard healthchecks wouldn't be sufficient for my purpose and my question still stands: how to use result... On Mon, Sep 12, 2016 at 1:48 PM, haosdent wrote: > Hi, @victor What's your health check agent used for? Because Mesos > supports health checks now. > > On Tue, Sep 13, 2016 at 1:46 AM, Victor L wrote: > >> Hello, >> I am writing "healthcheck agent" for mesos deployment framework as >> independent thread periodically checking if main process ( started by >> framework) is running... >> What would be the mechanism to "communicate" failure to the >> framework to cause specific outcome? For example: how can i use failure >> to >> cause framework to reschedule deployment on different node? >> Thanks, >> > > > > -- > Best Regards, > Haosdent Huang > >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >> >> > > > -- > Best Regards, > Haosdent Huang >
Re: what is the status on this?
@Alex Rukletsov I am sorry I took some time to respond. I am very excited since the beginning to have an opportunity to work on this task but I wanted to take my time if I can really commit to the Task and looks I might be able to however I have not contributed to open source before and I would need some help from someone who can point me to the right parts of the code and basically help me navigate through the process and if that is feasible I will be happy to commit some time every week to work on this. please let me know if that works. On Tue, Sep 6, 2016 11:59 AM, Dario Rexin dre...@apple.com wrote: Frameworks would use the redirect mechanism of the HTTP API and in case of unteachable nodes could do round robin on the list of master nodes. On Sep 6, 2016, at 11:52 AM, Joseph Wu wrote: And for discovery of other nodes in the Paxos group. The work on modularizing/decoupling Zookeeper is a prerequisite for having the replicated log perform leader election itself. <- That would merely be another implementation of the interface we will introduce in the process: https://issues.apache.org/jira/browse/MESOS-3574 On Tue, Sep 6, 2016 at 11:31 AM, Avinash Sridharan wrote: Also, I think, the replicated log itself uses Zookeeper for leader election. On Tue, Sep 6, 2016 at 12:15 PM, Zameer Manji wrote: If we use the replicated log for leader election, how will frameworks detect the leading master? Right now the scheduler driver uses the MasterInfo in ZK to discover the leader and detect leadership changes. On Mon, Sep 5, 2016 at 10:18 AM, Dario Rexin wrote: If we go and change this, why not simply remove any dependencies to external systems and simply use the replicated log for leader election? On Sep 5, 2016, at 9:02 AM, Alex Rukletsov wrote: Kant— thanks a lot for the feedback! Are you interested in helping out with Consul module once Jay and Joseph are done with modularizing patches? On Mon, Sep 5, 2016 at 8:50 AM, Jay JN Guo wrote: Patches are currently under review by @Joseph and can be found at the links provided by @haosdent.I took a quick look at Consul key/value HTTP APIs and they look very similar to Etcd APIs. You could actually reuse our Etcd module implementation once we manage to push the module into Mesos community.The only technical problem I could see for now is that Consul does not support `POST` with incremental key index. We may need to leverage `?cas=` operation in Consul to emulate the behaviour of joining a key group.We could have a discussion on how to implement Consul HA module.cheers,/J- Original message - From: haosdent To: user Cc: Jay JN Guo/China/IBM@IBMCN Subject: Re: what is the status on this? Date: Sun, Sep 4, 2016 6:10 PM Jay has some patches for de-couple Mesos with Zookeeper https://issues.apache.org/jira/browse/MESOS-5828 https://issues.apache.org/jira/browse/MESOS-5829I think it should be possible to support consul by custom modules after jay's work done.On Sun, Sep 4, 2016 at 6:02 PM, kant kodali wrote: Hi Alex, We have some experienced devops people here and they all had one thing in common which is Zookeeper is a pain to maintain. In fact we refused to bring in new tech stacks that require Zookeeper such as Kafka for example. so we desperately in search for alternative preferably using consul. I just hear lot of positive response when comes it consul. It will be great to see mesos and consul working together in which we would be ready to jump at it and make a switch for YARN to Mesos. Thanks, Kant On Wed, Aug 31, 2016 1:03 AM, Alex Rukletsov a...@mesosphere.com wrote:Kant— mind telling us what is your use case and why this ticket is important for you? It will help us prioritize work.On Fri, Aug 26, 2016 at 2:46 AM, tommy xiao < xia...@gmail.com> wrote:Hi guys, i always focus on t his case. but good news is etcd always have patchs. so the coming consul is very easy, just need some time to do coding on it. if you have interesting it? let us collaborate it.2016-08-26 8:11 GMT+08:00 Joseph Wu :There is no timeline as no one has done any work on the issue.On Thu, Aug 25, 2016 at 4:54 PM, kant kodali < kanth...@gmail.com> wrote: Hi Guys, I see this ticket and other related tickets should be part of sprints in 2015 and it is still not resolved yet. can we have a timeline on this? This would be really helpful https://issues.apache.org/jira/browse/MESOS-3797 Thanks! -- Deshi Xiao Twitter: xds2000 E-mail: xiaods(AT)gmail.com --Best Regards,Haosdent Huang -- Avinash Sridharan, Mesosphere+1 (323) 702 5245
Re: Unified cgroups isolator
Really appreciate @qian and @jie's great helps on this! It makes us easier to add cgroups isolation for rest subsystem. Additionally, if you find any changes about unified cgroups isolator break your environment, please let us know. I would try to fix asap. On Wed, Sep 14, 2016 at 1:59 AM, Jie Yu wrote: > Hi, > > We just merged the unified cgroups isolator. Huge shout out to @haosdent > and @qianzhang to make this happen! > https://issues.apache.org/jira/browse/MESOS-4697 > > Just to give you some context. Previously, it's a huge pain to add a new > cgroups subsystem to Mesos because it requires creating a new isolator (a > lot of code duplication). Now, we merge all the subsystems into one single > isolator, that makes adding a new subsystem very easy. > > More importantly, the new cgroups isolator supports cgroups v2! > > - Jie > -- Best Regards, Haosdent Huang
Re: Unified cgroups isolator
Awesome !! On Tue, Sep 13, 2016 at 10:59 AM, Jie Yu wrote: > Hi, > > We just merged the unified cgroups isolator. Huge shout out to @haosdent > and @qianzhang to make this happen! > https://issues.apache.org/jira/browse/MESOS-4697 > > Just to give you some context. Previously, it's a huge pain to add a new > cgroups subsystem to Mesos because it requires creating a new isolator (a > lot of code duplication). Now, we merge all the subsystems into one single > isolator, that makes adding a new subsystem very easy. > > More importantly, the new cgroups isolator supports cgroups v2! > > - Jie > -- Avinash Sridharan, Mesosphere +1 (323) 702 5245
Unified cgroups isolator
Hi, We just merged the unified cgroups isolator. Huge shout out to @haosdent and @qianzhang to make this happen! https://issues.apache.org/jira/browse/MESOS-4697 Just to give you some context. Previously, it's a huge pain to add a new cgroups subsystem to Mesos because it requires creating a new isolator (a lot of code duplication). Now, we merge all the subsystems into one single isolator, that makes adding a new subsystem very easy. More importantly, the new cgroups isolator supports cgroups v2! - Jie
Re: forcing framework to re-schedule?
If you want to kill the task from the scheduler, you just need to call `killTask`( https://github.com/apache/mesos/blob/1.0.x/include/mesos/scheduler.hpp#L257 ). If you want to kill the task by health check, you could try to set the correct `consecutive_failures` number ( https://github.com/apache/mesos/blob/1.0.x/include/mesos/v1/mesos.proto#L358 ). On Tue, Sep 13, 2016 at 2:53 AM, Victor L wrote: > How can i explicitly kill the task from my class? > > On Mon, Sep 12, 2016 at 2:10 PM, haosdent wrote: > >> If the target you perform health check is your task, Mesos support health >> check by a command. When your task reaches the health task failure limit, >> the task would be killed and then your framework could launch the task >> again when receives the `TASK_KILLED` in `statusUpdate`. >> >> On Tue, Sep 13, 2016 at 2:03 AM, Victor L wrote: >> >>> It checks if process is functional. I don't think standard healthchecks >>> wouldn't be sufficient for my purpose and my question still stands: how to >>> use result... >>> >>> On Mon, Sep 12, 2016 at 1:48 PM, haosdent wrote: >>> Hi, @victor What's your health check agent used for? Because Mesos supports health checks now. On Tue, Sep 13, 2016 at 1:46 AM, Victor L wrote: > Hello, > I am writing "healthcheck agent" for mesos deployment framework as > independent thread periodically checking if main process ( started by > framework) is running... > What would be the mechanism to "communicate" failure to the framework > to cause specific outcome? For example: how can i use failure to cause > framework to reschedule deployment on different node? > Thanks, > -- Best Regards, Haosdent Huang >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang