Re: forcing framework to re-schedule?

2016-09-13 Thread haosdent
Hi, @Victor taskId is specified in `TaskInfo` when you launchTask.

On Wed, Sep 14, 2016 at 6:22 AM, Victor L  wrote:

> how can i get taskId to call "killTask"?
>
> On Tue, Sep 13, 2016 at 9:59 AM, haosdent  wrote:
>
>> If you want to kill the task from the scheduler, you just need to call
>> `killTask`(https://github.com/apache/mesos/blob/1.0.x/includ
>> e/mesos/scheduler.hpp#L257).
>> If you want to kill the task by health check, you could try to set the
>> correct `consecutive_failures` number (https://github.com/apache/mes
>> os/blob/1.0.x/include/mesos/v1/mesos.proto#L358).
>>
>> On Tue, Sep 13, 2016 at 2:53 AM, Victor L  wrote:
>>
>>> How can i explicitly kill the task from my class?
>>>
>>> On Mon, Sep 12, 2016 at 2:10 PM, haosdent  wrote:
>>>
 If the target you perform health check is your task, Mesos support
 health check by a command. When your task reaches the health task failure
 limit, the task would be killed and then your framework could launch the
 task again when receives the `TASK_KILLED` in `statusUpdate`.

 On Tue, Sep 13, 2016 at 2:03 AM, Victor L  wrote:

> It checks if process is functional. I don't think standard
> healthchecks wouldn't be sufficient for my purpose and my question still
> stands: how  to use result...
>
> On Mon, Sep 12, 2016 at 1:48 PM, haosdent  wrote:
>
>> Hi, @victor What's your health check agent used for? Because Mesos
>> supports health checks now.
>>
>> On Tue, Sep 13, 2016 at 1:46 AM, Victor L 
>> wrote:
>>
>>> Hello,
>>> I am writing "healthcheck agent" for mesos deployment framework as
>>> independent thread periodically checking if main process ( started by
>>> framework) is running...
>>> What would be the mechanism to "communicate" failure to the
>>> framework  to cause specific outcome? For example: how can i use 
>>> failure to
>>> cause framework to reschedule deployment on different node?
>>> Thanks,
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


 --
 Best Regards,
 Haosdent Huang

>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang


Unable to run latest windows mesas-agent

2016-09-13 Thread Rinaldo Digiorgio
Hi,

I built a new windows mesas-agent.exe a few minutes ago and I am getting the 
following error. I don’t think it is a port issue.

I0913 18:07:22.359485 12992 slave.cpp:1692] Launching task 
'windows-hello-world.83d04669-7a17-11e6-ae4e-0021f6964572' for framework 
03b7ca4b-1a97-4044-ba65-9a92458987ad-0026
W0913 18:07:22.359485 12992 slave.cpp:1760] Ignoring running task 
'windows-hello-world.83d04669-7a17-11e6-ae4e-0021f6964572' of framework 
03b7ca4b-1a97-4044-ba65-9a92458987ad-0026 because the framework is terminating
I0913 18:07:22.359485 12992 slave.cpp:4660] Cleaning up framework 
03b7ca4b-1a97-4044-ba65-9a92458987ad-0026
I0913 18:07:22.359485  3600 status_update_manager.cpp:285] Closing status 
update streams for framework 03b7ca4b-1a97-4044-ba65-9a92458987ad-0026
E0913 18:07:22.359485 12992 slave.cpp:5350] Failed to find the mtime of 
'C:\mesos\slaves\1eb74be0-b996-4da5-a3a7-ee8ea7728dc5-S20\frameworks\03b7ca4b-1a97-4044-ba65-9a92458987ad-0026':
 Error invoking stat for 
'C:\mesos\slaves\1eb74be0-b996-4da5-a3a7-ee8ea7728dc5-S20\frameworks\03b7ca4b-1a97-4044-ba65-9a92458987ad-0026':
 No such file or directory
I0913 18:07:22.359485  8988 gc.cpp:55] Scheduling 
'C:\mesos\meta\slaves\1eb74be0-b996-4da5-a3a7-ee8ea7728dc5-S20\frameworks\03b7ca4b-1a97-4044-ba65-9a92458987ad-0026'
 for gc 6.9583928296days in the future
I0913 18:07:22.359485 12992 slave.cpp:783] Agent terminating
ABORT: 
(C:\jenkins\workspace\mesos-windows-build\3rdparty\stout\include\stout/os/windows/socket.hpp:136):
 Not expecting 'getsockopt' to fail when passed a valid socket

Re: Unified cgroups isolator

2016-09-13 Thread Qian Zhang
Thanks @haosdent's awesome work and @Jie's great shepherding and guidance
on this project!


Thanks,
Qian Zhang

On Wed, Sep 14, 2016 at 7:56 AM, Gilbert Song  wrote:

> Awesome!
>
> Kudos to @haosdent and @qianzhang!
>
> On Tue, Sep 13, 2016 at 11:22 AM, haosdent  wrote:
>
>> Really appreciate @qian and @jie's great helps on this! It makes us
>> easier to add cgroups isolation for rest subsystem.
>>
>> Additionally, if you find any changes about unified cgroups isolator
>> break your environment, please let us know. I would
>> try to fix asap.
>>
>> On Wed, Sep 14, 2016 at 1:59 AM, Jie Yu  wrote:
>>
>>> Hi,
>>>
>>> We just merged the unified cgroups isolator. Huge shout out to @haosdent
>>> and @qianzhang to make this happen!
>>> https://issues.apache.org/jira/browse/MESOS-4697
>>>
>>> Just to give you some context. Previously, it's a huge pain to add a new
>>> cgroups subsystem to Mesos because it requires creating a new isolator (a
>>> lot of code duplication). Now, we merge all the subsystems into one single
>>> isolator, that makes adding a new subsystem very easy.
>>>
>>> More importantly, the new cgroups isolator supports cgroups v2!
>>>
>>> - Jie
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


Re: Unified cgroups isolator

2016-09-13 Thread Gilbert Song
Awesome!

Kudos to @haosdent and @qianzhang!

On Tue, Sep 13, 2016 at 11:22 AM, haosdent  wrote:

> Really appreciate @qian and @jie's great helps on this! It makes us easier
> to add cgroups isolation for rest subsystem.
>
> Additionally, if you find any changes about unified cgroups isolator break
> your environment, please let us know. I would
> try to fix asap.
>
> On Wed, Sep 14, 2016 at 1:59 AM, Jie Yu  wrote:
>
>> Hi,
>>
>> We just merged the unified cgroups isolator. Huge shout out to @haosdent
>> and @qianzhang to make this happen!
>> https://issues.apache.org/jira/browse/MESOS-4697
>>
>> Just to give you some context. Previously, it's a huge pain to add a new
>> cgroups subsystem to Mesos because it requires creating a new isolator (a
>> lot of code duplication). Now, we merge all the subsystems into one single
>> isolator, that makes adding a new subsystem very easy.
>>
>> More importantly, the new cgroups isolator supports cgroups v2!
>>
>> - Jie
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: forcing framework to re-schedule?

2016-09-13 Thread Victor L
how can i get taskId to call "killTask"?

On Tue, Sep 13, 2016 at 9:59 AM, haosdent  wrote:

> If you want to kill the task from the scheduler, you just need to call
> `killTask`(https://github.com/apache/mesos/blob/1.0.x/
> include/mesos/scheduler.hpp#L257).
> If you want to kill the task by health check, you could try to set the
> correct `consecutive_failures` number (https://github.com/apache/
> mesos/blob/1.0.x/include/mesos/v1/mesos.proto#L358).
>
> On Tue, Sep 13, 2016 at 2:53 AM, Victor L  wrote:
>
>> How can i explicitly kill the task from my class?
>>
>> On Mon, Sep 12, 2016 at 2:10 PM, haosdent  wrote:
>>
>>> If the target you perform health check is your task, Mesos support
>>> health check by a command. When your task reaches the health task failure
>>> limit, the task would be killed and then your framework could launch the
>>> task again when receives the `TASK_KILLED` in `statusUpdate`.
>>>
>>> On Tue, Sep 13, 2016 at 2:03 AM, Victor L  wrote:
>>>
 It checks if process is functional. I don't think standard healthchecks
 wouldn't be sufficient for my purpose and my question still stands: how  to
 use result...

 On Mon, Sep 12, 2016 at 1:48 PM, haosdent  wrote:

> Hi, @victor What's your health check agent used for? Because Mesos
> supports health checks now.
>
> On Tue, Sep 13, 2016 at 1:46 AM, Victor L  wrote:
>
>> Hello,
>> I am writing "healthcheck agent" for mesos deployment framework as
>> independent thread periodically checking if main process ( started by
>> framework) is running...
>> What would be the mechanism to "communicate" failure to the
>> framework  to cause specific outcome? For example: how can i use failure 
>> to
>> cause framework to reschedule deployment on different node?
>> Thanks,
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>


>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>


Re: what is the status on this?

2016-09-13 Thread kant kodali
@Alex Rukletsov I am sorry I took some time to respond. I am very excited since
the beginning to have an opportunity to work on this task but I wanted to take
my time if I can really commit to the Task and looks I might be able to however
I have not contributed to open source before and I would need some help from
someone who can point me to the right parts of the code and basically help me
navigate through the process and if that is feasible I will be happy to commit
some time every week to work on this. please let me know if that works.
 





On Tue, Sep 6, 2016 11:59 AM, Dario Rexin dre...@apple.com
wrote:
Frameworks would use the redirect mechanism of the HTTP API and in case of
unteachable nodes could do round robin on the list of master nodes.
On Sep 6, 2016, at 11:52 AM, Joseph Wu  wrote:

And for discovery of other nodes in the Paxos group.

The work on modularizing/decoupling Zookeeper is a prerequisite for having the
replicated log perform leader election itself.  <- That would merely be another
implementation of the interface we will introduce in the process:

https://issues.apache.org/jira/browse/MESOS-3574

On Tue, Sep 6, 2016 at 11:31 AM, Avinash Sridharan  
wrote:
Also, I think, the replicated log itself uses Zookeeper for leader election.
On Tue, Sep 6, 2016 at 12:15 PM, Zameer Manji   wrote:
If we use the replicated log for leader election, how will frameworks detect the
leading master? Right now the scheduler driver uses the MasterInfo in ZK to
discover the leader and detect leadership changes.
On Mon, Sep 5, 2016 at 10:18 AM, Dario Rexin   wrote:
If we go and change this, why not simply remove any dependencies to external
systems and simply use the replicated log for leader election?
On Sep 5, 2016, at 9:02 AM, Alex Rukletsov  wrote:

Kant—
thanks a lot for the feedback! Are you interested in helping out with Consul
module once Jay and Joseph are done with modularizing patches?
On Mon, Sep 5, 2016 at 8:50 AM, Jay JN Guo   wrote:
Patches are currently under review by @Joseph and can be found at the links
provided by @haosdent.I took a quick look at Consul key/value HTTP APIs and they
look very similar to Etcd APIs. You could actually reuse our Etcd module
implementation once we manage to push the module into Mesos community.The only
technical problem I could see for now is that Consul does not support `POST`
with incremental key index. We may need to leverage `?cas=` operation in
Consul to emulate the behaviour of joining a key group.We could have a
discussion on how to implement Consul HA module.cheers,/J- Original message
-
From: haosdent 
To: user 
Cc: Jay JN Guo/China/IBM@IBMCN
Subject: Re: what is the status on this?
Date: Sun, Sep 4, 2016 6:10 PM
Jay has some patches for de-couple Mesos with Zookeeper
https://issues.apache.org/jira/browse/MESOS-5828
https://issues.apache.org/jira/browse/MESOS-5829I think it should be possible to
support consul by custom modules after jay's work done.On Sun, Sep 4, 2016 at
6:02 PM, kant kodali   wrote:  Hi Alex,
 We have some experienced devops people here and they all had one thing in
common which is Zookeeper is a pain to maintain. In fact we refused to bring in
new tech stacks that require Zookeeper such as Kafka for example. so we
desperately in search for alternative preferably using consul. I just hear lot
of positive response when comes it consul. It will be great to see mesos and
consul working together in which we would be ready to jump at it and make a
switch for YARN to Mesos.
 Thanks,
 Kant  



On Wed, Aug 31, 2016 1:03 AM, Alex Rukletsov a...@mesosphere.com  wrote:Kant—
mind telling us what is your use case and why this ticket is important for you?
It will help us prioritize work.On Fri, Aug 26, 2016 at 2:46 AM, tommy xiao <
xia...@gmail.com>  wrote:Hi guys, i always focus on t his case. but good news is
etcd always have patchs. so the coming consul is very easy, just need some time
to do coding on it. if you have interesting it? let us collaborate it.2016-08-26
8:11 GMT+08:00 Joseph Wu :There is no timeline as no one
has done any work on the issue.On Thu, Aug 25, 2016 at 4:54 PM, kant kodali <
kanth...@gmail.com>  wrote:  Hi Guys,
 I see this ticket and other related tickets should be part of sprints in 2015
and it is still not resolved yet. can we have a timeline on this? This would be
really helpful
 https://issues.apache.org/jira/browse/MESOS-3797
 Thanks!  -- Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com  --Best Regards,Haosdent Huang





-- 
Avinash Sridharan, Mesosphere+1 (323) 702 5245

Re: Unified cgroups isolator

2016-09-13 Thread haosdent
Really appreciate @qian and @jie's great helps on this! It makes us easier
to add cgroups isolation for rest subsystem.

Additionally, if you find any changes about unified cgroups isolator break
your environment, please let us know. I would
try to fix asap.

On Wed, Sep 14, 2016 at 1:59 AM, Jie Yu  wrote:

> Hi,
>
> We just merged the unified cgroups isolator. Huge shout out to @haosdent
> and @qianzhang to make this happen!
> https://issues.apache.org/jira/browse/MESOS-4697
>
> Just to give you some context. Previously, it's a huge pain to add a new
> cgroups subsystem to Mesos because it requires creating a new isolator (a
> lot of code duplication). Now, we merge all the subsystems into one single
> isolator, that makes adding a new subsystem very easy.
>
> More importantly, the new cgroups isolator supports cgroups v2!
>
> - Jie
>



-- 
Best Regards,
Haosdent Huang


Re: Unified cgroups isolator

2016-09-13 Thread Avinash Sridharan
Awesome !!


On Tue, Sep 13, 2016 at 10:59 AM, Jie Yu  wrote:

> Hi,
>
> We just merged the unified cgroups isolator. Huge shout out to @haosdent
> and @qianzhang to make this happen!
> https://issues.apache.org/jira/browse/MESOS-4697
>
> Just to give you some context. Previously, it's a huge pain to add a new
> cgroups subsystem to Mesos because it requires creating a new isolator (a
> lot of code duplication). Now, we merge all the subsystems into one single
> isolator, that makes adding a new subsystem very easy.
>
> More importantly, the new cgroups isolator supports cgroups v2!
>
> - Jie
>



-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245


Unified cgroups isolator

2016-09-13 Thread Jie Yu
Hi,

We just merged the unified cgroups isolator. Huge shout out to @haosdent
and @qianzhang to make this happen!
https://issues.apache.org/jira/browse/MESOS-4697

Just to give you some context. Previously, it's a huge pain to add a new
cgroups subsystem to Mesos because it requires creating a new isolator (a
lot of code duplication). Now, we merge all the subsystems into one single
isolator, that makes adding a new subsystem very easy.

More importantly, the new cgroups isolator supports cgroups v2!

- Jie


Re: forcing framework to re-schedule?

2016-09-13 Thread haosdent
If you want to kill the task from the scheduler, you just need to call
`killTask`(
https://github.com/apache/mesos/blob/1.0.x/include/mesos/scheduler.hpp#L257
).
If you want to kill the task by health check, you could try to set the
correct `consecutive_failures` number (
https://github.com/apache/mesos/blob/1.0.x/include/mesos/v1/mesos.proto#L358
).

On Tue, Sep 13, 2016 at 2:53 AM, Victor L  wrote:

> How can i explicitly kill the task from my class?
>
> On Mon, Sep 12, 2016 at 2:10 PM, haosdent  wrote:
>
>> If the target you perform health check is your task, Mesos support health
>> check by a command. When your task reaches the health task failure limit,
>> the task would be killed and then your framework could launch the task
>> again when receives the `TASK_KILLED` in `statusUpdate`.
>>
>> On Tue, Sep 13, 2016 at 2:03 AM, Victor L  wrote:
>>
>>> It checks if process is functional. I don't think standard healthchecks
>>> wouldn't be sufficient for my purpose and my question still stands: how  to
>>> use result...
>>>
>>> On Mon, Sep 12, 2016 at 1:48 PM, haosdent  wrote:
>>>
 Hi, @victor What's your health check agent used for? Because Mesos
 supports health checks now.

 On Tue, Sep 13, 2016 at 1:46 AM, Victor L  wrote:

> Hello,
> I am writing "healthcheck agent" for mesos deployment framework as
> independent thread periodically checking if main process ( started by
> framework) is running...
> What would be the mechanism to "communicate" failure to the framework
> to cause specific outcome? For example: how can i use failure to cause
> framework to reschedule deployment on different node?
> Thanks,
>



 --
 Best Regards,
 Haosdent Huang

>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang