Re: [VOTE] Release Apache Mesos 0.24.2 (rc4)

2016-03-31 Thread Michael Park
Hey Ben,

I was able to observe the error you're pointing out for 0.26.1-rc3.

With 0.24.2-rc4 and 0.25.1-rc3 however, I observed the deprecated SASL
warning that gets propagated as an error.

For example:

./../../src/authentication/cram_md5/authenticatee.cpp:75:7: error:
'sasl_dispose' is deprecated: first deprecated in OS X 10.11
[-Werror,-Wdeprecated-declarations]
  sasl_dispose();

How did you encounter the issues with 0.24.2-rc4 and 0.25.1-rc3? Are you
still running OS X Yosemite?

Also, I think I was under the impression that we don't officially support
OS X aside from convenience for developers.
I ask because rc2s for 0.24.2 and 0.25.1 had the same SASL issues which
made them not compile on OS X El Capitan.

If it is, I'll cut a new rc with the SASL warning patches as well as the
glog patch. Just wanted to confirm.

Thanks,

MPark

On 31 March 2016 at 21:30, Benjamin Mahler  wrote:

> I'm seeing the following on OS X for the three RCs that were sent out:
>
> $ ./configure CC=clang CXX=clang++ --disable-python --disable-java
> ...
> $ make check -j7
> ...
> ./mesos-tests
> dyld: Symbol not found: __ZN3fLB21FLAGS_drop_log_memoryE
>   Referenced from:
> /Users/bmahler/tmp/testing/mesos-0.24.2/src/.libs/libmesos-0.24.2.dylib
>   Expected in: flat namespace
>  in /Users/bmahler/tmp/testing/mesos-0.24.2/src/.libs/libmesos-0.24.2.dylib
>
> I think we need the following patch as well from the glog change:
>
> commit 363b0b059bdc7742b2258a33ebfe430fd03f4311
> Author: Kapil Arya 
> Date:   Mon Jan 25 00:41:17 2016 -0500
>
> Fixed non-linux build involving glog drop_log_meory flag.
>
> The variable "FLAGS_drop_log_memory" is available only on Linux.
>
> Review: https://reviews.apache.org/r/42704
>
> On Thu, Mar 31, 2016 at 3:28 PM, Michael Park  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 0.24.2.
>>
>> NOTE: I made a mistake of not updating the CHANGELOG for rc3, which is
>> why this is an rc4.
>>
>> 0.24.2 includes the following:
>>
>> 
>> No changes from rc2:
>>
>> * Improvements
>> - Allocator filter performance
>> - Port Ranges performance
>> - UUID performance
>> - `/state` endpoint performance
>>   - GLOG performance
>>   - Configurable task/framework history
>>   - Offer filter timeout fix for backlogged allocator
>>
>> * Bugs
>>   - SSL
>>   - Libevent
>>   - Fixed point resources math
>>   - HDFS
>>   - Agent upgrade compatibility
>>   - Health checks
>>
>> New fixes in rc4:
>>
>>   - JSON-based credential files. (MESOS-3560)
>>   - Mesos health check within docker container. (MESOS-3738)
>>   - Deletion of special files. (MESOS-4979)
>>   - Memory leak in subprocess. (MESOS-5021)
>>
>> Thank you to Evan Krall from Yelp for requesting MESOS-3560 and
>> MESOS-3738 to be included,
>> and Ben Mahler for requesting MESOS-4979 and MESOS-5021.
>>
>> The CHANGELOG for the release is available at:
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.2-rc4
>>
>> 
>>
>> The candidate for Mesos 0.24.2 release is available at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz
>>
>> The tag to be voted on is 0.24.2-rc4:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.24.2-rc4
>>
>> The MD5 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1124
>>
>> Please vote on releasing this package as Apache Mesos 0.24.2!
>>
>> The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 0.24.2
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>>
>> MPark
>>
>
>


Re: MESOS-5055~MESOS-5057: update strings to use agent instead of slave

2016-03-31 Thread tommy xiao
Cool


MESOS-5055~MESOS-5057: update strings to use agent instead of slave

2016-03-31 Thread Zhou Z Xing

Developers and users,

   We are now making changes in the code to update all the strings,
including logs, error messages and standard output message, to replace term
'slave' with term 'agent'.  Related tickets are MESOS-5055, MESOS-5056 and
MESOS-5057.

   If your system or log processing programs depend on those strings with
term 'slave' , please let us know so that we can arrange the plan of
merging related patches, thanks!

Thanks & Best Wishes,

Tom Xing(邢舟)
Emerging Technology Institute, IBM China Software Development Lab
--
IBM China Software Development Laboratory (CSDL)
Notes ID:Zhou Z Xing/China/IBM
Phone   :86-10-82450442
e-Mail  :xingz...@cn.ibm.com
Address :Building No.28, ZhongGuanCun Software Park, No.8 Dong Bei Wang
West Road, Haidian District, Beijing, P.R.China 100193
地址:中国北京市海淀区东北旺西路8号 中关村软件园28号楼 100193


Re: Question on slave recovery

2016-03-31 Thread Benjamin Mahler
I'd recommend not using /tmp to store the meta-information because if there
is a tmpwatch it will remove things that we need for agent recovery. We
probably should change the default --work_dir, or require that the user
specify one.

It's expected that wiping the work directory will cause the newly started
agent to destroy any orphaned tasks, if cgroup isolation is enabled. Are
you using cgroup isolation? Can you include logs?

On Fri, Mar 25, 2016 at 6:17 AM, Pradeep Chhetri <
pradeep.chhetr...@gmail.com> wrote:

>
> Hello,
>
> I remember when i was running some older mesos version (maybe 0.23.0),
> whenever slave restart used to fail either due to adding some new attribute
> or announcing different resource than default, I used to cleanup the
> /tmp/mesos (mesos working dir) & this used to bring down the existing
> executors/tasks.
>
> Yesterday, I noticed that even after cleaning up /tmp/mesos and starting
> slaves (registered with different slave id) didn't bring down the existing
> executor/tasks. I am running 0.28.0.
>
> I would like to know what has improved in slave recovery process because i
> was assuming that i deleted all the information related to checkpointing by
> cleaning up /tmp/mesos.
>
> --
> Regards,
> Pradeep Chhetri
>


Re: Mesos agents across a WAN?

2016-03-31 Thread Brian Devins
I would recommend looking at what Yelp did with their PaaSTA project. They
have an interesting approach to the multi-region orchestration.


Re: [VOTE] Release Apache Mesos 0.26.1 (rc3)

2016-03-31 Thread Benjamin Mahler
make check fails on OS X. Looks like we're missing the following:

commit 363b0b059bdc7742b2258a33ebfe430fd03f4311
Author: Kapil Arya 
Date:   Mon Jan 25 00:41:17 2016 -0500

Fixed non-linux build involving glog drop_log_meory flag.

The variable "FLAGS_drop_log_memory" is available only on Linux.

Review: https://reviews.apache.org/r/42704

On Thu, Mar 31, 2016 at 4:56 PM, Michael Park  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.26.1.
>
>
> 0.26.1 includes the following:
>
> 
> No changes from rc2:
>
> * Improvements
>   - `/state` endpoint performance
>   - `systemd` integration
>   - GLOG performance
>   - Configurable task/framework history
>   - Offer filter timeout fix for backlogged allocator
>
> * Bugs
>   - SSL
>   - Libevent
>   - Fixed point resources math
>   - HDFS
>   - Agent upgrade compatibility
>
> New fixes in rc4:
>
>   - Deletion of special files. (MESOS-4979)
>   - Memory leak in subprocess. (MESOS-5021)
>
> Thank you to Ben Mahler for requesting MESOS-4979 and MESOS-5021 to be
> included.
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.1-rc3
>
> 
>
> The candidate for Mesos 0.26.1 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc3/mesos-0.26.1.tar.gz
>
> The tag to be voted on is 0.26.1-rc3:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.1-rc3
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc3/mesos-0.26.1.tar.gz.md5
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc3/mesos-0.26.1.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1128
>
> Please vote on releasing this package as Apache Mesos 0.26.1!
>
> The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.26.1
> [ ] -1 Do not release this package because ...
>
> Thanks,
>
> MPark
>


Re: [VOTE] Release Apache Mesos 0.25.1 (rc3)

2016-03-31 Thread Benjamin Mahler
make check fails on OS X. Looks like we're missing the following:

commit 363b0b059bdc7742b2258a33ebfe430fd03f4311
Author: Kapil Arya 
Date:   Mon Jan 25 00:41:17 2016 -0500

Fixed non-linux build involving glog drop_log_meory flag.

The variable "FLAGS_drop_log_memory" is available only on Linux.

Review: https://reviews.apache.org/r/42704

On Thu, Mar 31, 2016 at 4:25 PM, Michael Park  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.25.1.
>
>
> 0.25.1 includes the following:
>
> 
> No changes from rc2:
>
> * Improvements
>   - `/state` endpoint performance
>   - `systemd` integration
>   - GLOG performance
>   - Configurable task/framework history
>   - Offer filter timeout fix for backlogged allocator
>
> * Bugs
>   - SSL
>   - Libevent
>   - Fixed point resources math
>   - HDFS
>   - Agent upgrade compatibility
>   - Health checks
>
> New fixes in rc4:
>
>   - JSON-based credential files. (MESOS-3560)
>   - Mesos health check within docker container. (MESOS-3738)
>   - Deletion of special files. (MESOS-4979)
>   - Memory leak in subprocess. (MESOS-5021)
>
> Thank you to Evan Krall from Yelp for requesting MESOS-3560 and
> MESOS-3738 to be included,
> and Ben Mahler for requesting MESOS-4979 and MESOS-5021.
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.25.1-rc3
>
> 
>
> The candidate for Mesos 0.25.1 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc3/mesos-0.25.1.tar.gz
>
> The tag to be voted on is 0.25.1-rc3:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.25.1-rc3
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc3/mesos-0.25.1.tar.gz.md5
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc3/mesos-0.25.1.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1126
>
> Please vote on releasing this package as Apache Mesos 0.25.1!
>
> The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.25.1
> [ ] -1 Do not release this package because ...
>
> Thanks,
>
> MPark
>


Re: [VOTE] Release Apache Mesos 0.24.2 (rc4)

2016-03-31 Thread Benjamin Mahler
I'm seeing the following on OS X for the three RCs that were sent out:

$ ./configure CC=clang CXX=clang++ --disable-python --disable-java
...
$ make check -j7
...
./mesos-tests
dyld: Symbol not found: __ZN3fLB21FLAGS_drop_log_memoryE
  Referenced from:
/Users/bmahler/tmp/testing/mesos-0.24.2/src/.libs/libmesos-0.24.2.dylib
  Expected in: flat namespace
 in /Users/bmahler/tmp/testing/mesos-0.24.2/src/.libs/libmesos-0.24.2.dylib

I think we need the following patch as well from the glog change:

commit 363b0b059bdc7742b2258a33ebfe430fd03f4311
Author: Kapil Arya 
Date:   Mon Jan 25 00:41:17 2016 -0500

Fixed non-linux build involving glog drop_log_meory flag.

The variable "FLAGS_drop_log_memory" is available only on Linux.

Review: https://reviews.apache.org/r/42704

On Thu, Mar 31, 2016 at 3:28 PM, Michael Park  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.24.2.
>
> NOTE: I made a mistake of not updating the CHANGELOG for rc3, which is why
> this is an rc4.
>
> 0.24.2 includes the following:
>
> 
> No changes from rc2:
>
> * Improvements
> - Allocator filter performance
> - Port Ranges performance
> - UUID performance
> - `/state` endpoint performance
>   - GLOG performance
>   - Configurable task/framework history
>   - Offer filter timeout fix for backlogged allocator
>
> * Bugs
>   - SSL
>   - Libevent
>   - Fixed point resources math
>   - HDFS
>   - Agent upgrade compatibility
>   - Health checks
>
> New fixes in rc4:
>
>   - JSON-based credential files. (MESOS-3560)
>   - Mesos health check within docker container. (MESOS-3738)
>   - Deletion of special files. (MESOS-4979)
>   - Memory leak in subprocess. (MESOS-5021)
>
> Thank you to Evan Krall from Yelp for requesting MESOS-3560 and
> MESOS-3738 to be included,
> and Ben Mahler for requesting MESOS-4979 and MESOS-5021.
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.2-rc4
>
> 
>
> The candidate for Mesos 0.24.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz
>
> The tag to be voted on is 0.24.2-rc4:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.24.2-rc4
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz.md5
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1124
>
> Please vote on releasing this package as Apache Mesos 0.24.2!
>
> The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.24.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
>
> MPark
>


[VOTE] Release Apache Mesos 0.26.1 (rc3)

2016-03-31 Thread Michael Park
Hi all,

Please vote on releasing the following candidate as Apache Mesos 0.26.1.


0.26.1 includes the following:

No changes from rc2:

* Improvements
  - `/state` endpoint performance
  - `systemd` integration
  - GLOG performance
  - Configurable task/framework history
  - Offer filter timeout fix for backlogged allocator

* Bugs
  - SSL
  - Libevent
  - Fixed point resources math
  - HDFS
  - Agent upgrade compatibility

New fixes in rc4:

  - Deletion of special files. (MESOS-4979)
  - Memory leak in subprocess. (MESOS-5021)

Thank you to Ben Mahler for requesting MESOS-4979 and MESOS-5021 to be
included.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.1-rc3


The candidate for Mesos 0.26.1 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc3/mesos-0.26.1.tar.gz

The tag to be voted on is 0.26.1-rc3:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.1-rc3

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc3/mesos-0.26.1.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.1-rc3/mesos-0.26.1.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1128

Please vote on releasing this package as Apache Mesos 0.26.1!

The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a majority
of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.26.1
[ ] -1 Do not release this package because ...

Thanks,

MPark


[VOTE] Release Apache Mesos 0.25.1 (rc3)

2016-03-31 Thread Michael Park
Hi all,

Please vote on releasing the following candidate as Apache Mesos 0.25.1.


0.25.1 includes the following:

No changes from rc2:

* Improvements
  - `/state` endpoint performance
  - `systemd` integration
  - GLOG performance
  - Configurable task/framework history
  - Offer filter timeout fix for backlogged allocator

* Bugs
  - SSL
  - Libevent
  - Fixed point resources math
  - HDFS
  - Agent upgrade compatibility
  - Health checks

New fixes in rc4:

  - JSON-based credential files. (MESOS-3560)
  - Mesos health check within docker container. (MESOS-3738)
  - Deletion of special files. (MESOS-4979)
  - Memory leak in subprocess. (MESOS-5021)

Thank you to Evan Krall from Yelp for requesting MESOS-3560 and MESOS-3738
to be included,
and Ben Mahler for requesting MESOS-4979 and MESOS-5021.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.25.1-rc3


The candidate for Mesos 0.25.1 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc3/mesos-0.25.1.tar.gz

The tag to be voted on is 0.25.1-rc3:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.25.1-rc3

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc3/mesos-0.25.1.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.25.1-rc3/mesos-0.25.1.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1126

Please vote on releasing this package as Apache Mesos 0.25.1!

The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a majority
of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.25.1
[ ] -1 Do not release this package because ...

Thanks,

MPark


Re: [RFC] Mesos Releases and Support

2016-03-31 Thread Vinod Kone
Thanks for all those who commented on the doc so far. The feedback was
great.

I'm planning to finalize the doc by end of this week, so please provide
feedback if you haven't and wanted to.

Regarding the proposals themselves, looks like most people are in favor of
proposal 1. We can probably punt on LTS until we get some experience with
the new release and patch policies.

On Fri, Mar 25, 2016 at 12:21 PM, Vinod Kone  wrote:

> Hi folks,
>
> There has been some interest recently about Mesos releases and support
> policy. As promised, I spent some time thinking about this and written my
> thoughts down in a doc.
>
>
> https://docs.google.com/document/d/1A8MglUWST6pWan3cVw98v8uxTPew8RMKxxrRqiSENM0/edit?usp=sharing
>
> Please take a look and provide feedback. I'm especially interested in your
> opinion on the proposals.
>
> Thanks,
> Vinod
>
>
>


[VOTE] Release Apache Mesos 0.24.2 (rc4)

2016-03-31 Thread Michael Park
Hi all,

Please vote on releasing the following candidate as Apache Mesos 0.24.2.

NOTE: I made a mistake of not updating the CHANGELOG for rc3, which is why
this is an rc4.

0.24.2 includes the following:

No changes from rc2:

* Improvements
- Allocator filter performance
- Port Ranges performance
- UUID performance
- `/state` endpoint performance
  - GLOG performance
  - Configurable task/framework history
  - Offer filter timeout fix for backlogged allocator

* Bugs
  - SSL
  - Libevent
  - Fixed point resources math
  - HDFS
  - Agent upgrade compatibility
  - Health checks

New fixes in rc4:

  - JSON-based credential files. (MESOS-3560)
  - Mesos health check within docker container. (MESOS-3738)
  - Deletion of special files. (MESOS-4979)
  - Memory leak in subprocess. (MESOS-5021)

Thank you to Evan Krall from Yelp for requesting MESOS-3560 and MESOS-3738
to be included,
and Ben Mahler for requesting MESOS-4979 and MESOS-5021.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.2-rc4


The candidate for Mesos 0.24.2 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz

The tag to be voted on is 0.24.2-rc4:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.24.2-rc4

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.24.2-rc4/mesos-0.24.2.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1124

Please vote on releasing this package as Apache Mesos 0.24.2!

The vote is open until Mon Apr 4 23:59:59 EDT 2016 and passes if a majority
of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.24.2
[ ] -1 Do not release this package because ...

Thanks,

MPark


Re: Mesos agents across a WAN?

2016-03-31 Thread Vinod Kone
This is great info Evan, especially coming from a production experience.
Thanks for sharing it !

On Thu, Mar 31, 2016 at 1:49 PM, Evan Krall  wrote:

> On Wed, Mar 30, 2016 at 6:56 PM, Jeff Schroeder <
> jeffschroe...@computer.org> wrote:
>
>> Given regional bare metal Mesos clusters on multiple continents, are
>> there any known issues running some of the agents over the WAN? Is anyone
>> else doing it, or is this a terrible idea that I should tell management no
>> on?
>>
>> A few specifics:
>>
>> 1. Are there any known limitations or configuration gotchas I might
>> encounter?
>>
>
> One thing to keep in mind is that the masters maintain a distributed log
> through a consensus protocol, so there needs to be a quorum of masters that
> can talk to each other in order to operate. Consensus protocols tend to be
> very latency-sensitive, so you probably want to keep masters near each
> other.
>
> Some of our clusters span semi-wide geographical regions (in production,
> up to about 5 milliseconds RTT between master and some slaves). So far, we
> haven't seen any issues caused by that amount of latency, and I believe we
> have clusters in non-production environments which have even higher round
> trip between slaves and masters, and work fine. I haven't benchmarked task
> launch time or anything like that, so I can't say how much it affects the
> speed of operations.
>
> Mesos generally does the right thing around network partitions (changes
> won't propagate, but it won't kill your tasks), but if you're running
> things in Marathon and using TCP or HTTP healthchecks, be aware that
> Marathon does not rate limit itself on issuing task kills
>  for healthcheck
> failures. This means during a network partition, your applications will be
> fine, but once the network partition heals (or if you're experiencing
> packet loss but not total failure), Marathon will suddenly kill all of the
> tasks on the far side of the partition. A workaround for that is to use
> command health checks, which are run by the mesos slave.
>
>
>> 2. Does setting up ZK observers in each non-primary dc and pointing the
>> agents at them exclusively make sense?
>>
>
> My understanding of ZK observers is that they proxy writes to the actual
> ZK quorum members, so this would probably be fine. mesos-slave uses ZK to
> discover masters, and mesos-master uses ZK to do leader election; only
> mesos-master is doing any writes to ZK.
>
> I'm not sure how often mesos-slave reads from ZK to get the list of
> masters; I assume it doesn't bother if it has a live connection to a master.
>
>
>> 4. Any suggestions on how best to do agent attributes / constraints for
>> something like this? I was planning on having the config management add a
>> "data_center" agent attribute to match on.
>>
>
> If you're running services on Marathon or similar, I'd definitely
> recommend exposing the location of the slaves as an attribute, and having
> constraints to keep different instances of your application spread across
> the different locations. The "correct" constraints to apply depends on your
> application and latency / failure sensitivity.
>
> Evan
>
>
>> Thanks!
>>
>> [1]
>> https://github.com/kubernetes/kubernetes/blob/8813c955182e3c9daae68a8257365e02cd871c65/release-0.19.0/docs/proposals/federation.md#kubernetes-cluster-federation
>>
>> --
>> Jeff Schroeder
>>
>> Don't drink and derive, alcohol and analysis don't mix.
>> http://www.digitalprognosis.com
>>
>
>


Re: Mesos agents across a WAN?

2016-03-31 Thread Evan Krall
On Wed, Mar 30, 2016 at 6:56 PM, Jeff Schroeder 
wrote:

> Given regional bare metal Mesos clusters on multiple continents, are there
> any known issues running some of the agents over the WAN? Is anyone else
> doing it, or is this a terrible idea that I should tell management no on?
>
> A few specifics:
>
> 1. Are there any known limitations or configuration gotchas I might
> encounter?
>

One thing to keep in mind is that the masters maintain a distributed log
through a consensus protocol, so there needs to be a quorum of masters that
can talk to each other in order to operate. Consensus protocols tend to be
very latency-sensitive, so you probably want to keep masters near each
other.

Some of our clusters span semi-wide geographical regions (in production, up
to about 5 milliseconds RTT between master and some slaves). So far, we
haven't seen any issues caused by that amount of latency, and I believe we
have clusters in non-production environments which have even higher round
trip between slaves and masters, and work fine. I haven't benchmarked task
launch time or anything like that, so I can't say how much it affects the
speed of operations.

Mesos generally does the right thing around network partitions (changes
won't propagate, but it won't kill your tasks), but if you're running
things in Marathon and using TCP or HTTP healthchecks, be aware that
Marathon does not rate limit itself on issuing task kills
 for healthcheck
failures. This means during a network partition, your applications will be
fine, but once the network partition heals (or if you're experiencing
packet loss but not total failure), Marathon will suddenly kill all of the
tasks on the far side of the partition. A workaround for that is to use
command health checks, which are run by the mesos slave.


> 2. Does setting up ZK observers in each non-primary dc and pointing the
> agents at them exclusively make sense?
>

My understanding of ZK observers is that they proxy writes to the actual ZK
quorum members, so this would probably be fine. mesos-slave uses ZK to
discover masters, and mesos-master uses ZK to do leader election; only
mesos-master is doing any writes to ZK.

I'm not sure how often mesos-slave reads from ZK to get the list of
masters; I assume it doesn't bother if it has a live connection to a master.


> 4. Any suggestions on how best to do agent attributes / constraints for
> something like this? I was planning on having the config management add a
> "data_center" agent attribute to match on.
>

If you're running services on Marathon or similar, I'd definitely recommend
exposing the location of the slaves as an attribute, and having constraints
to keep different instances of your application spread across the different
locations. The "correct" constraints to apply depends on your application
and latency / failure sensitivity.

Evan


> Thanks!
>
> [1]
> https://github.com/kubernetes/kubernetes/blob/8813c955182e3c9daae68a8257365e02cd871c65/release-0.19.0/docs/proposals/federation.md#kubernetes-cluster-federation
>
> --
> Jeff Schroeder
>
> Don't drink and derive, alcohol and analysis don't mix.
> http://www.digitalprognosis.com
>


Re: Mesos agents across a WAN?

2016-03-31 Thread Jeff Schroeder
Interesting. Thanks Alex. Is anyone actively working on this at Mesosphere
or the community that wants to chat about it?

On Thursday, March 31, 2016, Alex Rukletsov  wrote:

> Jeff,
>
> regarding 3: we are investigating this:
> https://issues.apache.org/jira/browse/MESOS-3548
>
> On Thu, Mar 31, 2016 at 3:56 AM, Jeff Schroeder <
> jeffschroe...@computer.org
> > wrote:
>
>> Given regional bare metal Mesos clusters on multiple continents, are
>> there any known issues running some of the agents over the WAN? Is anyone
>> else doing it, or is this a terrible idea that I should tell management no
>> on?
>>
>> A few specifics:
>>
>> 1. Are there any known limitations or configuration gotchas I might
>> encounter?
>> 2. Does setting up ZK observers in each non-primary dc and pointing the
>> agents at them exclusively make sense?
>> 3. Are there plans on a mesos equivalent of something like ubernetes[1],
>> or would that be up to each framework?
>> 4. Any suggestions on how best to do agent attributes / constraints for
>> something like this? I was planning on having the config management add a
>> "data_center" agent attribute to match on.
>>
>> Thanks!
>>
>> [1]
>> https://github.com/kubernetes/kubernetes/blob/8813c955182e3c9daae68a8257365e02cd871c65/release-0.19.0/docs/proposals/federation.md#kubernetes-cluster-federation
>>
>> --
>> Jeff Schroeder
>>
>> Don't drink and derive, alcohol and analysis don't mix.
>> http://www.digitalprognosis.com
>>
>
>

-- 
Text by Jeff, typos by iPhone


Re: 0.28.1

2016-03-31 Thread Jie Yu
Hi folks,

Looks like all the 0.28.1 patches have landed

.

I plan to cut an RC by the end of this week. If anyone has other patches
for 0.28.1, please contact me asap. Thanks!

- Jie

On Wed, Mar 23, 2016 at 8:35 PM, Benjamin Mahler  wrote:

> Thanks Jie, I've added a fix version of 0.28.1 to:
> https://issues.apache.org/jira/browse/MESOS-5021
>
> On Fri, Mar 18, 2016 at 5:52 PM, Jie Yu  wrote:
>
>> Hi,
>>
>> We recently noticed two bugs
>> 
>>  in
>> 0.28.0 related to the unified containerizer:
>>
>> Because of that, I propose we cut a point release (0.28.1) once these two
>> bugs are fixed. I volunteer to be the release manager for this point
>> release.
>>
>> In the meantime, if you have any issue that you want to merge into
>> 0.28.1, please mark the relevant ticket's fix version to be 0.28.1 so that
>> I am aware of that.
>>
>> Thanks!
>> - Jie
>>
>
>


Re: Mesos agents across a WAN?

2016-03-31 Thread Alex Rukletsov
Jeff,

regarding 3: we are investigating this:
https://issues.apache.org/jira/browse/MESOS-3548

On Thu, Mar 31, 2016 at 3:56 AM, Jeff Schroeder 
wrote:

> Given regional bare metal Mesos clusters on multiple continents, are there
> any known issues running some of the agents over the WAN? Is anyone else
> doing it, or is this a terrible idea that I should tell management no on?
>
> A few specifics:
>
> 1. Are there any known limitations or configuration gotchas I might
> encounter?
> 2. Does setting up ZK observers in each non-primary dc and pointing the
> agents at them exclusively make sense?
> 3. Are there plans on a mesos equivalent of something like ubernetes[1],
> or would that be up to each framework?
> 4. Any suggestions on how best to do agent attributes / constraints for
> something like this? I was planning on having the config management add a
> "data_center" agent attribute to match on.
>
> Thanks!
>
> [1]
> https://github.com/kubernetes/kubernetes/blob/8813c955182e3c9daae68a8257365e02cd871c65/release-0.19.0/docs/proposals/federation.md#kubernetes-cluster-federation
>
> --
> Jeff Schroeder
>
> Don't drink and derive, alcohol and analysis don't mix.
> http://www.digitalprognosis.com
>


Re: Cleaning up failed tasks in the UI

2016-03-31 Thread Alberto del Barrio

Hi Adam,

that's exactly what happened. Thanks a lot for the explanation and 
suggestion. Now mesos is clean again :)


On 03/31/16 03:51, Adam Bordelon wrote:
I suspect that after your maintenance operation, Marathon may have 
registered with a new frameworkId and launched is own copies of your 
tasks (why you see double). However, the old Marathon frameworkId 
probably has a failover_timeout of a week, so it will continue to be 
considered "registered", but "disconnected".
Check the /frameworks endpoint to see if Mesos thinks you have two 
Marathons registered.
If so, you can use the /teardown endpoint to unregister the old one, 
which will cause all of its tasks to be killed.


On Wed, Mar 30, 2016 at 4:56 AM, Alberto del Barrio 
> wrote:


Hi haosdent,

thanks for your reply. It is actually very weird, first time I see
this situation in around one year using mesos.
I am pasting here the truncate output you asked for. It is showing
one of the tasks with "Failed" state under "Active tasks":

{
"executor_id": "",
"framework_id":
"c857c625-25dc-4650-89b8-de4b597026ed-",
"id":
"pixie.33f85e8f-f03b-11e5-af6c-fa6389efeef1",
"labels": [
   ..
],
"name": "myTask",
"resources": {
"cpus": 4.0,
"disk": 0,
"mem": 2560.0,
"ports": "[31679-31679]"
},
"slave_id":
"c857c625-25dc-4650-89b8-de4b597026ed-S878",
"state": "TASK_FAILED",
"statuses": [
{
"container_status": {
"network_infos": [
{
"ip_address": "10.XX.XX.XX"
}
]
},
"state": "TASK_RUNNING",
"timestamp": 1458657321.16671
},
{
"container_status": {
"network_infos": [
{
"ip_address": "10.XX.XX.XX"
}
]
},
"state": "TASK_FAILED",
"timestamp": 1459329310.13663
}
]
},


t


On 03/30/16 13:30, haosdent wrote:

>"Active tasks" with status "Failed"
A bit wired here. According to my test, it should exists in
"Completed Tasks". If possible, could you show you /master/state
endpoint result. I think the frameworks node in state response
would be helpful to analyze the problem.

On Wed, Mar 30, 2016 at 6:26 PM, Alberto del Barrio
> wrote:

Hi all,

after a maintenance carried on in a mesos cluster (0.25)
using marathon (0.10) as a only scheduler , I've finished
with the double of tasks for each application. But marathon
was recognizing only half of them.
For getting rid of this orphaned tasks, I've did a "kill PID"
of them, so they free up their resources.

My problem now is that these tasks I've killed, are still
appearing in the mesos UI under "Active tasks" with status
"Failed". This is not affecting my system, but I would like
to clean them up.
Googling I can't find anything.
Can someone point me to a solution for cleaning those tasks?

Cheers,
Alberto.




-- 
Best Regards,

Haosdent Huang