Re: [VOTE] Release Apache Mesos 1.4.0 (rc5)

2017-09-15 Thread Kapil Arya
+1 (binding)

Internal CI with Centos 6/7, Fedora 23, Debian 8, and Ubuntu 12/14/16.

On Fri, Sep 15, 2017 at 5:08 PM, Vinod Kone  wrote:

> Ok. Looks like a test issue per https://reviews.apache.org/r/60467/
>
> +1(binding)
>
> On Fri, Sep 15, 2017 at 12:16 PM, Michael Park  wrote:
>
>> Vinod, regarding MESOS-7729
>> :
>>
>> I found MESOS-6345  related
>> to persistent volume framework, which leads me to believe that this is not
>> new.
>>
>> Thanks,
>>
>> MPark
>>
>> On Tue, Sep 12, 2017 at 12:01 PM Vinod Kone  wrote:
>>
>>> Tested this on ASF CI.
>>>
>>> Saw 3 flaky tests.
>>>
>>> https://issues.apache.org/jira/browse/MESOS-7729
>>> 
>>>
>>> https://issues.apache.org/jira/browse/MESOS-7971
>>> https://issues.apache.org/jira/browse/MESOS-7972
>>>
>>> The first one was a known (since 1.4.0) flaky test with a double free
>>> corruption. @Kapil and @MPark can you verify that this is an issue with
>>> the
>>> test and not the source code? Once verified, I'll give a +1.
>>>
>>> *Revision*: b3fd2e7ab26e118222fe18af4b92c53a3c01e6cc
>>>
>>>- refs/tags/1.4.0-rc5
>>>
>>> Configuration Matrix gcc clang
>>> centos:7 --verbose --enable-libevent --enable-ssl autotools
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>>> GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
>>> 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Not run]
>>> cmake
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>>> 1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Not run]
>>> --verbose autotools
>>> [image: Failed]
>>> >> ease/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
>>> 3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Not run]
>>> cmake
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> ,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
>>> exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Not run]
>>> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>>> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>>> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=-
>>> -verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>>> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>>> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> cmake
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>>> 1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
>>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--
>>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_
>>> v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%
>>> 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> --verbose autotools
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
>>> 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Success]
>>> >> ease/42/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=-
>>> -verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
>>> 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> cmake
>>> [image: Failed]
>>> >> ease/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> ,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
>>> label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> [image: Failed]
>>> >

Re: [VOTE] Release Apache Mesos 1.4.0 (rc5)

2017-09-15 Thread Vinod Kone
Ok. Looks like a test issue per https://reviews.apache.org/r/60467/

+1(binding)

On Fri, Sep 15, 2017 at 12:16 PM, Michael Park  wrote:

> Vinod, regarding MESOS-7729
> :
>
> I found MESOS-6345  related
> to persistent volume framework, which leads me to believe that this is not
> new.
>
> Thanks,
>
> MPark
>
> On Tue, Sep 12, 2017 at 12:01 PM Vinod Kone  wrote:
>
>> Tested this on ASF CI.
>>
>> Saw 3 flaky tests.
>>
>> https://issues.apache.org/jira/browse/MESOS-7729
>> 
>>
>> https://issues.apache.org/jira/browse/MESOS-7971
>> https://issues.apache.org/jira/browse/MESOS-7972
>>
>> The first one was a known (since 1.4.0) flaky test with a double free
>> corruption. @Kapil and @MPark can you verify that this is an issue with
>> the
>> test and not the source code? Once verified, I'll give a +1.
>>
>> *Revision*: b3fd2e7ab26e118222fe18af4b92c53a3c01e6cc
>>
>>- refs/tags/1.4.0-rc5
>>
>> Configuration Matrix gcc clang
>> centos:7 --verbose --enable-libevent --enable-ssl autotools
>> [image: Success]
>> > Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--
>> enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>> 20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> cmake
>> [image: Success]
>> > Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
>> 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> --verbose autotools
>> [image: Failed]
>> > Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,
>> ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
>> exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> cmake
>> [image: Success]
>> > Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
>> 3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>> [image: Success]
>> > Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--
>> enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%
>> 20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%
>> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Success]
>> > Release/42/BUILDTOOL=autotools,COMPILER=clang,
>> CONFIGURATION=--verbose%20--enable-libevent%20--enable-
>> ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
>> 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> cmake
>> [image: Success]
>> > Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Success]
>> > Release/42/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=-
>> -verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> --verbose autotools
>> [image: Success]
>> > Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,
>> ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
>> label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Success]
>> > Release/42/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose,
>> ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,
>> label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> cmake
>> [image: Failed]
>> > Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--
>> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
>> 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Failed]
>> > Release/42/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=-
>> -verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%
>> 3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>
>>
>>
>>
>>
>> On Sat, Sep 9, 2017 at 6:49 AM, Kapil Arya  wr

Re: [VOTE] Release Apache Mesos 1.4.0 (rc5)

2017-09-15 Thread Michael Park
Vinod, regarding MESOS-7729
:

I found MESOS-6345  related
to persistent volume framework, which leads me to believe that this is not
new.

Thanks,

MPark

On Tue, Sep 12, 2017 at 12:01 PM Vinod Kone  wrote:

> Tested this on ASF CI.
>
> Saw 3 flaky tests.
>
> https://issues.apache.org/jira/browse/MESOS-7729
> 
>
> https://issues.apache.org/jira/browse/MESOS-7971
> https://issues.apache.org/jira/browse/MESOS-7972
>
> The first one was a known (since 1.4.0) flaky test with a double free
> corruption. @Kapil and @MPark can you verify that this is an issue with the
> test and not the source code? Once verified, I'll give a +1.
>
> *Revision*: b3fd2e7ab26e118222fe18af4b92c53a3c01e6cc
>
>- refs/tags/1.4.0-rc5
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Not run]
> cmake
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Not run]
> --verbose autotools
> [image: Failed]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Not run]
> cmake
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> cmake
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> --verbose autotools
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Success]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> cmake
> [image: Failed]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
> [image: Failed]
> <
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/42/BUILDTOOL=cmake,COMPILER=clang,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/
> >
>
>
>
>
> On Sat, Sep 9, 2017 at 6:49 AM, Kapil Arya  wrote:
>
> > Hi all,
> >
> > [NOTE: Starting with this RC candidate, we will not be "releasing" RC jar
> > files in the Maven release channel. This prevents polluting of the Maven
> > repositories with numerous RC tags. As before, you can continue to test
> the
> > release candidate using the Maven staging repository prov

Re: [ISSUE] Check failed: slave.maintenance.isSome()

2017-09-15 Thread Qi Feng
My mesos version is 1.2.0. Sorry.


From: Qi Feng 
Sent: Friday, September 15, 2017 10:14 AM
To: user@mesos.apache.org; Bayou
Subject: Re: [ISSUE] Check failed: slave.maintenance.isSome()


This case could be reproduced by calling `for i in {1..8}; do python call.py; 
done` (call.py gist: 
https://gist.github.com/athlum/e2cd04bfb9f81a790d31643606252a49 ).

Looks like there is something wrong when call /maintenance/schedule 
concurrently.

We met this case because we use wrote a service base on ansible that manage the 
mesos cluster. When we create a task to update slave configs with a certain 
number of workers. Just like:

  1.  call schedule for 3 machine: a,b,c.
  2.  as machine a was done, maintenance window updates to: b,c
  3.  as an other machine "d" assigned after a immediately, windows will update 
to: b,c,d

This change sometimes happen in little interval. Then we find the fatal log 
just in Bayou's mail.

What's the right way to update maintanence window? Thanks to any reply.


From: Bayou 
Sent: Thursday, September 14, 2017 12:06 PM
To: user@mesos.apache.org
Subject: [ISSUE] Check failed: slave.maintenance.isSome()

Hi all,
   I’m trying to continuously do mesos-maintenance-schedule, machine-down, 
machine-up, mesos-maintenance-schedule-cancel over and over again in a 
three-slaves cluster, no any other operations, just trying mesos API to do 
these to schedule the three slaves asynchronously. At the beginning, It worked 
well, after I tried many times, about hundreds times, unfortunately, there were 
alway a Check failed of slave.maintenance.isSome() and mesos master crashed, 
the origin code at
https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L983
And some log from mesos master at below:
2017-09-12 16:39:07.394 err mesos-master[254491]: F0912 16:39:07.393944 254527 
hierarchical.cpp:903] Check failed: slave.maintenance.isSome()
2017-09-12 16:39:07.394 err mesos-master[254491]: *** Check failure stack 
trace: ***
2017-09-12 16:39:07.402 err mesos-master[254491]: @ 0x7f4cf356fba6  
google::LogMessage::Fail()
2017-09-12 16:39:07.413 err mesos-master[254491]: @ 0x7f4cf356fb05  
google::LogMessage::SendToLog()
2017-09-12 16:39:07.420 err mesos-master[254491]: @ 0x7f4cf356f516  
google::LogMessage::Flush()
2017-09-12 16:39:07.424 err mesos-master[254491]: @ 0x7f4cf357224a  
google::LogMessageFatal::~LogMessageFatal()
2017-09-12 16:39:07.429 err mesos-master[254491]: @ 0x7f4cf2344a32  
mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::updateInverseOffer()
2017-09-12 16:39:07.435 err mesos-master[254491]: @ 0x7f4cf1f8d9f9  
_ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_7SlaveIDERKNS1_11FrameworkIDERK6OptionINS1_20UnavailableResourcesEERKSC_INS1_9allocator18InverseOfferStatusEERKSC_INS1_7FiltersEES6_S9_SE_SJ_SN_EEvRKNS_3PIDIT_EEMSR_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES18_
2017-09-12 16:39:07.445 err mesos-master[254491]: @ 0x7f4cf1f938bb  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_7SlaveIDERKNS5_11FrameworkIDERK6OptionINS5_20UnavailableResourcesEERKSG_INS5_9allocator18InverseOfferStatusEERKSG_INS5_7FiltersEESA_SD_SI_SN_SR_EEvRKNS0_3PIDIT_EEMSV_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
2017-09-12 16:39:07.455 err mesos-master[254491]: @ 0x7f4cf34dd049  
std::function<>::operator()()
2017-09-12 16:39:07.460 err mesos-master[254491]: @ 0x7f4cf34c1285  
process::ProcessBase::visit()
2017-09-12 16:39:07.464 err mesos-master[254491]: @ 0x7f4cf34cc58a  
process::DispatchEvent::visit()
2017-09-12 16:39:07.465 err mesos-master[254491]: @ 0x7f4cf4e4ad4e  
process::ProcessBase::serve()
2017-09-12 16:39:07.469 err mesos-master[254491]: @ 0x7f4cf34bd281  
process::ProcessManager::resume()
2017-09-12 16:39:07.471 err mesos-master[254491]: @ 0x7f4cf34b9a2c  
_ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
2017-09-12 16:39:07.473 err mesos-master[254491]: @ 0x7f4cf34cbbf2  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
2017-09-12 16:39:07.475 err mesos-master[254491]: @ 0x7f4cf34cbb36  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
2017-09-12 16:39:07.477 err mesos-master[254491]: @ 0x7f4cf34cbac0  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced3ba1e0  
(unknown)
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced613dc5  
start_thread
2017-09-12 16:39:07.479 err mesos-master[254491]: @ 0x7f4cecb21ced  
__clone
2

Re: [ISSUE] Check failed: slave.maintenance.isSome()

2017-09-15 Thread Qi Feng
This case could be reproduced by calling `for i in {1..8}; do python call.py; 
done` (call.py gist: 
https://gist.github.com/athlum/e2cd04bfb9f81a790d31643606252a49 ).

Looks like there is something wrong when call /maintenance/schedule 
concurrently.

We met this case because we use wrote a service base on ansible that manage the 
mesos cluster. When we create a task to update slave configs with a certain 
number of workers. Just like:

  1.  call schedule for 3 machine: a,b,c.
  2.  as machine a was done, maintenance window updates to: b,c
  3.  as an other machine "d" assigned after a immediately, windows will update 
to: b,c,d

This change sometimes happen in little interval. Then we find the fatal log 
just in Bayou's mail.

What's the right way to update maintanence window? Thanks to any reply.


From: Bayou 
Sent: Thursday, September 14, 2017 12:06 PM
To: user@mesos.apache.org
Subject: [ISSUE] Check failed: slave.maintenance.isSome()

Hi all,
   I’m trying to continuously do mesos-maintenance-schedule, machine-down, 
machine-up, mesos-maintenance-schedule-cancel over and over again in a 
three-slaves cluster, no any other operations, just trying mesos API to do 
these to schedule the three slaves asynchronously. At the beginning, It worked 
well, after I tried many times, about hundreds times, unfortunately, there were 
alway a Check failed of slave.maintenance.isSome() and mesos master crashed, 
the origin code at
https://github.com/apache/mesos/blob/2fe2bb26a425da9aaf1d7cf34019dd347d0cf9a4/src/master/allocator/mesos/hierarchical.cpp#L983
And some log from mesos master at below:
2017-09-12 16:39:07.394 err mesos-master[254491]: F0912 16:39:07.393944 254527 
hierarchical.cpp:903] Check failed: slave.maintenance.isSome()
2017-09-12 16:39:07.394 err mesos-master[254491]: *** Check failure stack 
trace: ***
2017-09-12 16:39:07.402 err mesos-master[254491]: @ 0x7f4cf356fba6  
google::LogMessage::Fail()
2017-09-12 16:39:07.413 err mesos-master[254491]: @ 0x7f4cf356fb05  
google::LogMessage::SendToLog()
2017-09-12 16:39:07.420 err mesos-master[254491]: @ 0x7f4cf356f516  
google::LogMessage::Flush()
2017-09-12 16:39:07.424 err mesos-master[254491]: @ 0x7f4cf357224a  
google::LogMessageFatal::~LogMessageFatal()
2017-09-12 16:39:07.429 err mesos-master[254491]: @ 0x7f4cf2344a32  
mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::updateInverseOffer()
2017-09-12 16:39:07.435 err mesos-master[254491]: @ 0x7f4cf1f8d9f9  
_ZZN7process8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS1_7SlaveIDERKNS1_11FrameworkIDERK6OptionINS1_20UnavailableResourcesEERKSC_INS1_9allocator18InverseOfferStatusEERKSC_INS1_7FiltersEES6_S9_SE_SJ_SN_EEvRKNS_3PIDIT_EEMSR_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES18_
2017-09-12 16:39:07.445 err mesos-master[254491]: @ 0x7f4cf1f938bb  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master9allocator21MesosAllocatorProcessERKNS5_7SlaveIDERKNS5_11FrameworkIDERK6OptionINS5_20UnavailableResourcesEERKSG_INS5_9allocator18InverseOfferStatusEERKSG_INS5_7FiltersEESA_SD_SI_SN_SR_EEvRKNS0_3PIDIT_EEMSV_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
2017-09-12 16:39:07.455 err mesos-master[254491]: @ 0x7f4cf34dd049  
std::function<>::operator()()
2017-09-12 16:39:07.460 err mesos-master[254491]: @ 0x7f4cf34c1285  
process::ProcessBase::visit()
2017-09-12 16:39:07.464 err mesos-master[254491]: @ 0x7f4cf34cc58a  
process::DispatchEvent::visit()
2017-09-12 16:39:07.465 err mesos-master[254491]: @ 0x7f4cf4e4ad4e  
process::ProcessBase::serve()
2017-09-12 16:39:07.469 err mesos-master[254491]: @ 0x7f4cf34bd281  
process::ProcessManager::resume()
2017-09-12 16:39:07.471 err mesos-master[254491]: @ 0x7f4cf34b9a2c  
_ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
2017-09-12 16:39:07.473 err mesos-master[254491]: @ 0x7f4cf34cbbf2  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
2017-09-12 16:39:07.475 err mesos-master[254491]: @ 0x7f4cf34cbb36  
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
2017-09-12 16:39:07.477 err mesos-master[254491]: @ 0x7f4cf34cbac0  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced3ba1e0  
(unknown)
2017-09-12 16:39:07.478 err mesos-master[254491]: @ 0x7f4ced613dc5  
start_thread
2017-09-12 16:39:07.479 err mesos-master[254491]: @ 0x7f4cecb21ced  
__clone
2017-09-12 16:39:07.486 notice systemd[1]: mesos-master.service: main process 
exited, code=killed, status=6/ABRT

Is this an issue or what I did something wrong? Hope someone could help me to 
work out this. Thank you.