Re: Mesos 1.5.0 Release

2017-12-21 Thread Jie Yu
Yeah, I am doing a grooming right now.

Sent from my iPhone

> On Dec 21, 2017, at 7:25 PM, Benjamin Mahler  wrote:
> 
> Meng is working on https://issues.apache.org/jira/browse/MESOS-8352 and we
> should land it tonight if not tomorrow. I can cherry pick if it's after
> your cut, and worst case it can go in 1.5.1.
> 
> Have you guys gone over the unresolved items targeted for 1.5.0? I see a
> lot of stuff, might be good to start adjusting / removing their target
> versions to give folks a chance to respond on the ticket?
> 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reviewable%2C%20Accepted)%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0
> 
> For example, https://issues.apache.org/jira/browse/MESOS-8337 looks pretty
> bad to me (master crash).
> 
>> On Thu, Dec 21, 2017 at 7:00 PM, Jie Yu  wrote:
>> 
>> Hi,
>> 
>> We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that needs to
>> go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
>> Thanks!
>> 
>> - Jie
>> 
>>> On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song  wrote:
>>> 
>>> Folks,
>>> 
>>> It is time for Mesos 1.5.0 release. I am the release manager.
>>> 
>>> We plan to cut the rc1 in next couple weeks. Please start to wrap up
>>> patches if you are contributing or shepherding any issue. If you expect
>>> any
>>> particular JIRA for this new release, please set *Target Version* as "
>>> *1.5.0"* and mark it as "*Blocker*" priority.
>>> 
>>> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>>> 
>>> Cheers,
>>> Gilbert
>>> 
>> 
>> 


Re: Duplicate task ID for same framework on different agents

2017-12-21 Thread Benjamin Mahler
It's a known issue:
https://issues.apache.org/jira/browse/MESOS-3070

Putting in place a protection mechanism sounds good, but is rather
complicated. See the comment in this ticket:
https://issues.apache.org/jira/browse/MESOS-6785

On Wed, Dec 20, 2017 at 8:26 PM, Zhitao Li  wrote:

> Hi all,
>
> We have seen a mesos master crash loop after a leader failover. After more
> investigation, it seems that a same task ID was managed to be created onto
> multiple Mesos agents in the cluster.
>
> One possible logical sequence which can lead to such problem:
>
> 1. Task T1 was launched to master M1 on agent A1 for framework F;
> 2. Master M1 failed over to M2;
> 3. Before A1 reregistered to M2, the same T1 was launched on to agent A2:
> M2 does not know previous T1 yet so it accepted it and sent to A2;
> 4. A1 reregistered: this probably crashed M2 (because same task cannot be
> added twice);
> 5. When M3 tries to come up after M2, it further crashes because both A1
> and A2 tried to add a T1 to the framework.
>
> (I only have logs to prove the last step right now)
>
> This happened on 1.4.0 masters.
>
> Although this is probably triggered by incorrect retry logic on framework
> side, I wonder whether Mesos master should do extra protection to prevent
> such issue to cause master crash loop. Some possible ideas are to instruct
> one of the agents carrying tasks w/ duplicate ID to terminate corresponding
> tasks, or just refuse to reregister such agents and instruct them to
> shutdown.
>
> I also filed MESOS-8353 
> to track this potential bug. Thanks!
>
>
> --
>
> Cheers,
>
> Zhitao Li
>


Re: Mesos 1.5.0 Release

2017-12-21 Thread Benjamin Mahler
Meng is working on https://issues.apache.org/jira/browse/MESOS-8352 and we
should land it tonight if not tomorrow. I can cherry pick if it's after
your cut, and worst case it can go in 1.5.1.

Have you guys gone over the unresolved items targeted for 1.5.0? I see a
lot of stuff, might be good to start adjusting / removing their target
versions to give folks a chance to respond on the ticket?

https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reviewable%2C%20Accepted)%20AND%20%22Target%20Version%2Fs%22%20%3D%201.5.0

For example, https://issues.apache.org/jira/browse/MESOS-8337 looks pretty
bad to me (master crash).

On Thu, Dec 21, 2017 at 7:00 PM, Jie Yu  wrote:

> Hi,
>
> We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that needs to
> go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
> Thanks!
>
> - Jie
>
> On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song  wrote:
>
>> Folks,
>>
>> It is time for Mesos 1.5.0 release. I am the release manager.
>>
>> We plan to cut the rc1 in next couple weeks. Please start to wrap up
>> patches if you are contributing or shepherding any issue. If you expect
>> any
>> particular JIRA for this new release, please set *Target Version* as "
>> *1.5.0"* and mark it as "*Blocker*" priority.
>>
>> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>>
>> Cheers,
>> Gilbert
>>
>
>


Re: Mesos 1.5.0 Release

2017-12-21 Thread Jie Yu
Hi,

We're about to cut 1.5.0-rc1 tomorrow. If you have any thing that needs to
go into 1.5.0 that hasn't landed, please let me or Gilbert know asap.
Thanks!

- Jie

On Fri, Dec 1, 2017 at 3:58 PM, Gilbert Song  wrote:

> Folks,
>
> It is time for Mesos 1.5.0 release. I am the release manager.
>
> We plan to cut the rc1 in next couple weeks. Please start to wrap up
> patches if you are contributing or shepherding any issue. If you expect any
> particular JIRA for this new release, please set *Target Version* as "
> *1.5.0"* and mark it as "*Blocker*" priority.
>
> The dashboard for Mesos 1.5.0 will be posted in this thread soon.
>
> Cheers,
> Gilbert
>