Re: New external dependency

2016-06-20 Thread Kevin Klues
The goal is to let users leverage the nvidia Docker images
(https://hub.docker.com/r/nvidia/) without any added effort on their
behalf. Using docker they are able to launch containers from these
images by simply running `nvidia-docker run ...` (i.e. they are
unaware that a magic volume is being injected on their behalf). On
Mesos we want the experience to be similar.

In terms of providing an external component to do the library
consolidation instead of building it into Mesos itself -- we
considered this.  We originally planned on building this functionality
as an isolator module (giving us the benefit of external linkage
without having to run a separate linux process), but there some some
limitations with the current isolator interface that prohibit us from
doing this properly. Moreover, building it as an isolator module would
mean that it couldn't be shared by the docker containerizer (which we
plan to add support for in the future).

On Mon, Jun 20, 2016 at 7:30 PM, Jean Christophe “JC” Martin
 wrote:
> Kevin,
>
> I agree about the need to create the volume, and gather the information. My 
> point was not really clear, sorry.
> My point was that it should not be different than any use case needing 
> special mounts and could either be solved by passing this information at the 
> time of container creation (it doesn’t seem that there are that many 
> libraries, and it would not be harder than say running the mesos slave in a 
> container, purely from a number of volume statements), or it could be solved 
> externally as the docker volume container does with a more generic solution.
>
> Thanks,
>
> JC
>
>> On Jun 20, 2016, at 6:59 PM, Kevin Klues  wrote:
>>
>> For now we've decided to actually remove the hard dependence on libelf
>> for the 1.0 release and spend a bit more time thinking about the right
>> way to pull it in.
>>
>> Jean, to answer your question though -- someone would still need to
>> consolidate these libraries, even if it wasn't left to Mesos to do so.
>> These libraries are spread across the file system, and need to be
>> pulled into a single place for easy injection. The full list of
>> binaries / libraries are here:
>>
>> https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109
>>
>> We could put this burden on the operator and trust he gets it right,
>> or we could have Mesos programmatically do it itself. We considered
>> just leveraging the nvidia-docker-plugin itself (instead of
>> duplicating its functionality into mesos), but ultimately decided it
>> was better not to introduce an external dependency on it (since it is
>> a separate running excutable, rather than a simple library, like
>> libelf).
>>
>> On Mon, Jun 20, 2016 at 5:12 PM, Jean Christophe “JC” Martin
>>  wrote:
>>> As an operator not using GPUs, I feel that the burden seems misplaced, and 
>>> disproportionate.
>>> I assume that the operator of a GPU cluster knows the location of the 
>>> libraries based on their OS, and could potentially provide this information 
>>> at the time of creating the containers. I am not sure to see why this 
>>> something that mesos is required to do (consolidating the libraries in the 
>>> volume, versus being a configuration/external information).
>>>
>>> Thanks,
>>>
>>> JC
>>>
 On Jun 20, 2016, at 2:30 PM, Kevin Klues  wrote:

 Sorry, the ticket just links to the nvidia-docker project without much
 further explanation. The information at the link below should make it
 a bit more clear:

 https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.

 The crux of the issue is that we need to be able consolidate all of
 the Nvidia binaries/libraries into a single volume that we inject into
 a docker container.  We use libelf is used to get the canonical names
 of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
 well as lookup what external dependences they have (i.e. NEEDED in
 their dynamic sections) in order to build this volume.

 NOTE: None of this volume support is actually in Mesos yet -- we just
 added the libelf dependence in anticipation of it.




 On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu  wrote:
> It's not immediately clear form the ticket why the change from optional
> dependency to required dependency though? Could you summarize?
>
>
> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues  wrote:
>>
>> Thanks Zhitao,
>>
>> I just pushed out a review for upgrades.md and added you as a reviewer.
>>
>> The new dependence was added in the JIRA that haosdent linked, but the
>> actual reason for adding the dependence is more related to:
>> https://issues.apache.org/jira/browse/MESOS-5401
>>
>> On Sun, Jun 19, 2016 at 9:34 AM, haosdent  wrote:
>>> The 

Re: New external dependency

2016-06-20 Thread haosdent
>just type "y" and GitHub will redirect to the latest commit in master
Cool!

On Tue, Jun 21, 2016 at 10:12 AM, Erik Weathers 
wrote:

> @Kevin:
>
> FYI, it's best practice to use a commit SHA in GitHub links so that future
> readers are seeing the content you intended.
>
> i.e., instead of:
>
>-
>
> https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109
>
> It's best to do:
>
>-
>
> https://github.com/NVIDIA/nvidia-docker/blob/101b436c89c3a74e9a3025a104587b6612d903d8/tools/src/nvidia/volumes.go#L109
>
>
> And (awesomely!) GitHub makes it trivial to do this!  [1]
>
>- when you're looking at a file (such as the original link you
>pasted), just type "y" and GitHub will redirect to the latest commit in
>master:
>
> - Erik
>
> [1] https://help.github.com/articles/getting-permanent-links-to-files/
>
> On Mon, Jun 20, 2016 at 6:59 PM, Kevin Klues  wrote:
>
>> For now we've decided to actually remove the hard dependence on libelf
>> for the 1.0 release and spend a bit more time thinking about the right
>> way to pull it in.
>>
>> Jean, to answer your question though -- someone would still need to
>> consolidate these libraries, even if it wasn't left to Mesos to do so.
>> These libraries are spread across the file system, and need to be
>> pulled into a single place for easy injection. The full list of
>> binaries / libraries are here:
>>
>>
>> https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109
>>
>> We could put this burden on the operator and trust he gets it right,
>> or we could have Mesos programmatically do it itself. We considered
>> just leveraging the nvidia-docker-plugin itself (instead of
>> duplicating its functionality into mesos), but ultimately decided it
>> was better not to introduce an external dependency on it (since it is
>> a separate running excutable, rather than a simple library, like
>> libelf).
>>
>> On Mon, Jun 20, 2016 at 5:12 PM, Jean Christophe “JC” Martin
>>  wrote:
>> > As an operator not using GPUs, I feel that the burden seems misplaced,
>> and disproportionate.
>> > I assume that the operator of a GPU cluster knows the location of the
>> libraries based on their OS, and could potentially provide this information
>> at the time of creating the containers. I am not sure to see why this
>> something that mesos is required to do (consolidating the libraries in the
>> volume, versus being a configuration/external information).
>> >
>> > Thanks,
>> >
>> > JC
>> >
>> >> On Jun 20, 2016, at 2:30 PM, Kevin Klues  wrote:
>> >>
>> >> Sorry, the ticket just links to the nvidia-docker project without much
>> >> further explanation. The information at the link below should make it
>> >> a bit more clear:
>> >>
>> >> https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.
>> >>
>> >> The crux of the issue is that we need to be able consolidate all of
>> >> the Nvidia binaries/libraries into a single volume that we inject into
>> >> a docker container.  We use libelf is used to get the canonical names
>> >> of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
>> >> well as lookup what external dependences they have (i.e. NEEDED in
>> >> their dynamic sections) in order to build this volume.
>> >>
>> >> NOTE: None of this volume support is actually in Mesos yet -- we just
>> >> added the libelf dependence in anticipation of it.
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu  wrote:
>> >>> It's not immediately clear form the ticket why the change from
>> optional
>> >>> dependency to required dependency though? Could you summarize?
>> >>>
>> >>>
>> >>> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues 
>> wrote:
>> 
>>  Thanks Zhitao,
>> 
>>  I just pushed out a review for upgrades.md and added you as a
>> reviewer.
>> 
>>  The new dependence was added in the JIRA that haosdent linked, but
>> the
>>  actual reason for adding the dependence is more related to:
>>  https://issues.apache.org/jira/browse/MESOS-5401
>> 
>>  On Sun, Jun 19, 2016 at 9:34 AM, haosdent 
>> wrote:
>> > The related issue is Change build to always enable Nvidia GPU
>> support
>> > for
>> > Linux
>> > Last time my local build break before Kevin send out the email, and
>> then
>> > find this change.
>> >
>> > On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li 
>> > wrote:
>> >>
>> >> Hi Kevin,
>> >>
>> >> Thanks for letting us know. It seems like this is not called out in
>> >> upgrades.md, so can you please document this additional dependency
>> >> there?
>> >>
>> >> Also, can you include the link to the JIRA or patch requiring this
>> >> dependency so we can have some contexts?
>> >>
>> >> Thanks!
>> >>
>> >> On Sat, 

Re: New external dependency

2016-06-20 Thread Erik Weathers
@Kevin:

FYI, it's best practice to use a commit SHA in GitHub links so that future
readers are seeing the content you intended.

i.e., instead of:

   -
   
https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109

It's best to do:

   -
   
https://github.com/NVIDIA/nvidia-docker/blob/101b436c89c3a74e9a3025a104587b6612d903d8/tools/src/nvidia/volumes.go#L109


And (awesomely!) GitHub makes it trivial to do this!  [1]

   - when you're looking at a file (such as the original link you pasted),
   just type "y" and GitHub will redirect to the latest commit in master:

- Erik

[1] https://help.github.com/articles/getting-permanent-links-to-files/

On Mon, Jun 20, 2016 at 6:59 PM, Kevin Klues  wrote:

> For now we've decided to actually remove the hard dependence on libelf
> for the 1.0 release and spend a bit more time thinking about the right
> way to pull it in.
>
> Jean, to answer your question though -- someone would still need to
> consolidate these libraries, even if it wasn't left to Mesos to do so.
> These libraries are spread across the file system, and need to be
> pulled into a single place for easy injection. The full list of
> binaries / libraries are here:
>
>
> https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109
>
> We could put this burden on the operator and trust he gets it right,
> or we could have Mesos programmatically do it itself. We considered
> just leveraging the nvidia-docker-plugin itself (instead of
> duplicating its functionality into mesos), but ultimately decided it
> was better not to introduce an external dependency on it (since it is
> a separate running excutable, rather than a simple library, like
> libelf).
>
> On Mon, Jun 20, 2016 at 5:12 PM, Jean Christophe “JC” Martin
>  wrote:
> > As an operator not using GPUs, I feel that the burden seems misplaced,
> and disproportionate.
> > I assume that the operator of a GPU cluster knows the location of the
> libraries based on their OS, and could potentially provide this information
> at the time of creating the containers. I am not sure to see why this
> something that mesos is required to do (consolidating the libraries in the
> volume, versus being a configuration/external information).
> >
> > Thanks,
> >
> > JC
> >
> >> On Jun 20, 2016, at 2:30 PM, Kevin Klues  wrote:
> >>
> >> Sorry, the ticket just links to the nvidia-docker project without much
> >> further explanation. The information at the link below should make it
> >> a bit more clear:
> >>
> >> https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.
> >>
> >> The crux of the issue is that we need to be able consolidate all of
> >> the Nvidia binaries/libraries into a single volume that we inject into
> >> a docker container.  We use libelf is used to get the canonical names
> >> of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
> >> well as lookup what external dependences they have (i.e. NEEDED in
> >> their dynamic sections) in order to build this volume.
> >>
> >> NOTE: None of this volume support is actually in Mesos yet -- we just
> >> added the libelf dependence in anticipation of it.
> >>
> >>
> >>
> >>
> >> On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu  wrote:
> >>> It's not immediately clear form the ticket why the change from optional
> >>> dependency to required dependency though? Could you summarize?
> >>>
> >>>
> >>> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues 
> wrote:
> 
>  Thanks Zhitao,
> 
>  I just pushed out a review for upgrades.md and added you as a
> reviewer.
> 
>  The new dependence was added in the JIRA that haosdent linked, but the
>  actual reason for adding the dependence is more related to:
>  https://issues.apache.org/jira/browse/MESOS-5401
> 
>  On Sun, Jun 19, 2016 at 9:34 AM, haosdent  wrote:
> > The related issue is Change build to always enable Nvidia GPU support
> > for
> > Linux
> > Last time my local build break before Kevin send out the email, and
> then
> > find this change.
> >
> > On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li 
> > wrote:
> >>
> >> Hi Kevin,
> >>
> >> Thanks for letting us know. It seems like this is not called out in
> >> upgrades.md, so can you please document this additional dependency
> >> there?
> >>
> >> Also, can you include the link to the JIRA or patch requiring this
> >> dependency so we can have some contexts?
> >>
> >> Thanks!
> >>
> >> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues 
> >> wrote:
> >>
> >>> Hello all,
> >>>
> >>> Just an FYI that the newest libmesos now has an external dependence
> >>> on
> >>> libelf on Linux. This dependence can be installed via the following
> >>> packages:
> >>>
> 

Re: New external dependency

2016-06-20 Thread Kevin Klues
For now we've decided to actually remove the hard dependence on libelf
for the 1.0 release and spend a bit more time thinking about the right
way to pull it in.

Jean, to answer your question though -- someone would still need to
consolidate these libraries, even if it wasn't left to Mesos to do so.
These libraries are spread across the file system, and need to be
pulled into a single place for easy injection. The full list of
binaries / libraries are here:

https://github.com/NVIDIA/nvidia-docker/blob/master/tools/src/nvidia/volumes.go#L109

We could put this burden on the operator and trust he gets it right,
or we could have Mesos programmatically do it itself. We considered
just leveraging the nvidia-docker-plugin itself (instead of
duplicating its functionality into mesos), but ultimately decided it
was better not to introduce an external dependency on it (since it is
a separate running excutable, rather than a simple library, like
libelf).

On Mon, Jun 20, 2016 at 5:12 PM, Jean Christophe “JC” Martin
 wrote:
> As an operator not using GPUs, I feel that the burden seems misplaced, and 
> disproportionate.
> I assume that the operator of a GPU cluster knows the location of the 
> libraries based on their OS, and could potentially provide this information 
> at the time of creating the containers. I am not sure to see why this 
> something that mesos is required to do (consolidating the libraries in the 
> volume, versus being a configuration/external information).
>
> Thanks,
>
> JC
>
>> On Jun 20, 2016, at 2:30 PM, Kevin Klues  wrote:
>>
>> Sorry, the ticket just links to the nvidia-docker project without much
>> further explanation. The information at the link below should make it
>> a bit more clear:
>>
>> https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.
>>
>> The crux of the issue is that we need to be able consolidate all of
>> the Nvidia binaries/libraries into a single volume that we inject into
>> a docker container.  We use libelf is used to get the canonical names
>> of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
>> well as lookup what external dependences they have (i.e. NEEDED in
>> their dynamic sections) in order to build this volume.
>>
>> NOTE: None of this volume support is actually in Mesos yet -- we just
>> added the libelf dependence in anticipation of it.
>>
>>
>>
>>
>> On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu  wrote:
>>> It's not immediately clear form the ticket why the change from optional
>>> dependency to required dependency though? Could you summarize?
>>>
>>>
>>> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues  wrote:

 Thanks Zhitao,

 I just pushed out a review for upgrades.md and added you as a reviewer.

 The new dependence was added in the JIRA that haosdent linked, but the
 actual reason for adding the dependence is more related to:
 https://issues.apache.org/jira/browse/MESOS-5401

 On Sun, Jun 19, 2016 at 9:34 AM, haosdent  wrote:
> The related issue is Change build to always enable Nvidia GPU support
> for
> Linux
> Last time my local build break before Kevin send out the email, and then
> find this change.
>
> On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li 
> wrote:
>>
>> Hi Kevin,
>>
>> Thanks for letting us know. It seems like this is not called out in
>> upgrades.md, so can you please document this additional dependency
>> there?
>>
>> Also, can you include the link to the JIRA or patch requiring this
>> dependency so we can have some contexts?
>>
>> Thanks!
>>
>> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues 
>> wrote:
>>
>>> Hello all,
>>>
>>> Just an FYI that the newest libmesos now has an external dependence
>>> on
>>> libelf on Linux. This dependence can be installed via the following
>>> packages:
>>>
>>> CentOS 6/7: yum install elfutils-libelf.x86_64
>>> Ubuntu14.04:   apt-get install libelf1
>>>
>>> Alternatively you can install from source:
>>> https://directory.fsf.org/wiki/Libelf
>>>
>>> For developers, you will also need to install the libelf headers in
>>> order to build master. This dependency can be installed via:
>>>
>>> CentOS: elfutils-libelf-devel.x86_64
>>> Ubuntu: libelf-dev
>>>
>>> Alternatively, you can install from source:
>>> https://directory.fsf.org/wiki/Libelf
>>>
>>> The getting started guide and the support/docker_build.sh scripts
>>> have
>>> been updated appropriately, but you may need to update your local
>>> environment if you don't yet have these packages installed.
>>>
>>> --
>>> ~Kevin
>>>
>>
>>
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>
>
>
>
> --
> Best 

Re: [VOTE] Release Apache Mesos 0.26.2 (rc1)

2016-06-20 Thread Kapil Arya
+1 (binding) Internal CI build.

Here is a link to the deb/rpm packages:

http://open.mesosphere.com/downloads/mesos-rc/#apache-mesos-0.26.2-rc1



On Mon, Jun 20, 2016 at 6:07 PM, Vinod Kone  wrote:

> +1 (binding)
>
> Tested on ASF CI w/ ubuntu. Note that centos:7 build issue is due to a
> known configuration issue (unable to find JAVA_HOME), fixed in 0.27.0, with
> the build script.
>
> *Revision*: 5edc7bab79649341c44df21c8842e05fb6d6c2bb
>
>- refs/tags/0.26.2-rc1
>
> Configuration Matrix gcc clang
> centos:7 --verbose --enable-libevent --enable-ssl
> [image: Failed]
> 
> [image: Not run]
> --verbose
> [image: Failed]
> 
> [image: Not run]
> ubuntu:14.04 --verbose --enable-libevent --enable-ssl
> [image: Success]
> 
> [image: Success]
> 
> --verbose
> [image: Success]
> 
> [image: Success]
> 
>
> On Mon, Jun 20, 2016 at 11:25 AM, Jie Yu  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 0.26.2.
>>
>>
>> 0.26.2 is a bug fix release. It includes the following:
>>
>> 
>> [MESOS-4705] - Linux 'perf' parsing logic may fail when OS distribution
>> has perf backports.
>> [MESOS-5449] - Memory leak in SchedulerProcess.declineOffer.
>>
>> The CHANGELOG for the release is available at:
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.2-rc1
>>
>> 
>>
>> The candidate for Mesos 0.26.2 release is available at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz
>>
>> The tag to be voted on is 0.26.2-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.2-rc1
>>
>> The MD5 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz.md5
>>
>> The signature of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1147
>>
>> Please vote on releasing this package as Apache Mesos 0.26.2!
>>
>> The vote is open until Thu Jun 23 11:23:33 PDT 2016 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 0.26.2
>> [ ] -1 Do not release this package because ...
>>
>> Thanks
>> - Jie
>>
>
>


Re: [VOTE] Release Apache Mesos 0.26.2 (rc1)

2016-06-20 Thread Vinod Kone
+1 (binding)

Tested on ASF CI w/ ubuntu. Note that centos:7 build issue is due to a
known configuration issue (unable to find JAVA_HOME), fixed in 0.27.0, with
the build script.

*Revision*: 5edc7bab79649341c44df21c8842e05fb6d6c2bb

   - refs/tags/0.26.2-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl
[image: Failed]

[image: Not run]
--verbose
[image: Failed]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl
[image: Success]

[image: Success]

--verbose
[image: Success]

[image: Success]


On Mon, Jun 20, 2016 at 11:25 AM, Jie Yu  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.26.2.
>
>
> 0.26.2 is a bug fix release. It includes the following:
>
> 
> [MESOS-4705] - Linux 'perf' parsing logic may fail when OS distribution
> has perf backports.
> [MESOS-5449] - Memory leak in SchedulerProcess.declineOffer.
>
> The CHANGELOG for the release is available at:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.2-rc1
>
> 
>
> The candidate for Mesos 0.26.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz
>
> The tag to be voted on is 0.26.2-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.2-rc1
>
> The MD5 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz.md5
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1147
>
> Please vote on releasing this package as Apache Mesos 0.26.2!
>
> The vote is open until Thu Jun 23 11:23:33 PDT 2016 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.26.2
> [ ] -1 Do not release this package because ...
>
> Thanks
> - Jie
>


Re: New external dependency

2016-06-20 Thread Kevin Klues
Sorry, the ticket just links to the nvidia-docker project without much
further explanation. The information at the link below should make it
a bit more clear:

https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver.

The crux of the issue is that we need to be able consolidate all of
the Nvidia binaries/libraries into a single volume that we inject into
a docker container.  We use libelf is used to get the canonical names
of all the Nvidia libraries (i.e. SONAME in their dynamic sections) as
well as lookup what external dependences they have (i.e. NEEDED in
their dynamic sections) in order to build this volume.

NOTE: None of this volume support is actually in Mesos yet -- we just
added the libelf dependence in anticipation of it.




On Mon, Jun 20, 2016 at 12:59 PM, Yan Xu  wrote:
> It's not immediately clear form the ticket why the change from optional
> dependency to required dependency though? Could you summarize?
>
>
> On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues  wrote:
>>
>> Thanks Zhitao,
>>
>> I just pushed out a review for upgrades.md and added you as a reviewer.
>>
>> The new dependence was added in the JIRA that haosdent linked, but the
>> actual reason for adding the dependence is more related to:
>> https://issues.apache.org/jira/browse/MESOS-5401
>>
>> On Sun, Jun 19, 2016 at 9:34 AM, haosdent  wrote:
>> > The related issue is Change build to always enable Nvidia GPU support
>> > for
>> > Linux
>> > Last time my local build break before Kevin send out the email, and then
>> > find this change.
>> >
>> > On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li 
>> > wrote:
>> >>
>> >> Hi Kevin,
>> >>
>> >> Thanks for letting us know. It seems like this is not called out in
>> >> upgrades.md, so can you please document this additional dependency
>> >> there?
>> >>
>> >> Also, can you include the link to the JIRA or patch requiring this
>> >> dependency so we can have some contexts?
>> >>
>> >> Thanks!
>> >>
>> >> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues 
>> >> wrote:
>> >>
>> >> > Hello all,
>> >> >
>> >> > Just an FYI that the newest libmesos now has an external dependence
>> >> > on
>> >> > libelf on Linux. This dependence can be installed via the following
>> >> > packages:
>> >> >
>> >> > CentOS 6/7: yum install elfutils-libelf.x86_64
>> >> > Ubuntu14.04:   apt-get install libelf1
>> >> >
>> >> > Alternatively you can install from source:
>> >> > https://directory.fsf.org/wiki/Libelf
>> >> >
>> >> > For developers, you will also need to install the libelf headers in
>> >> > order to build master. This dependency can be installed via:
>> >> >
>> >> > CentOS: elfutils-libelf-devel.x86_64
>> >> > Ubuntu: libelf-dev
>> >> >
>> >> > Alternatively, you can install from source:
>> >> > https://directory.fsf.org/wiki/Libelf
>> >> >
>> >> > The getting started guide and the support/docker_build.sh scripts
>> >> > have
>> >> > been updated appropriately, but you may need to update your local
>> >> > environment if you don't yet have these packages installed.
>> >> >
>> >> > --
>> >> > ~Kevin
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Cheers,
>> >>
>> >> Zhitao Li
>> >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Haosdent Huang
>>
>>
>>
>> --
>> ~Kevin
>
>



-- 
~Kevin


Re: New external dependency

2016-06-20 Thread Yan Xu
It's not immediately clear form the ticket why the change from optional
dependency to required dependency though? Could you summarize?

On Sun, Jun 19, 2016 at 12:33 PM, Kevin Klues  wrote:

> Thanks Zhitao,
>
> I just pushed out a review for upgrades.md and added you as a reviewer.
>
> The new dependence was added in the JIRA that haosdent linked, but the
> actual reason for adding the dependence is more related to:
> https://issues.apache.org/jira/browse/MESOS-5401
>
> On Sun, Jun 19, 2016 at 9:34 AM, haosdent  wrote:
> > The related issue is Change build to always enable Nvidia GPU support for
> > Linux
> > Last time my local build break before Kevin send out the email, and then
> > find this change.
> >
> > On Mon, Jun 20, 2016 at 12:11 AM, Zhitao Li 
> wrote:
> >>
> >> Hi Kevin,
> >>
> >> Thanks for letting us know. It seems like this is not called out in
> >> upgrades.md, so can you please document this additional dependency
> there?
> >>
> >> Also, can you include the link to the JIRA or patch requiring this
> >> dependency so we can have some contexts?
> >>
> >> Thanks!
> >>
> >> On Sat, Jun 18, 2016 at 10:25 AM, Kevin Klues 
> wrote:
> >>
> >> > Hello all,
> >> >
> >> > Just an FYI that the newest libmesos now has an external dependence on
> >> > libelf on Linux. This dependence can be installed via the following
> >> > packages:
> >> >
> >> > CentOS 6/7: yum install elfutils-libelf.x86_64
> >> > Ubuntu14.04:   apt-get install libelf1
> >> >
> >> > Alternatively you can install from source:
> >> > https://directory.fsf.org/wiki/Libelf
> >> >
> >> > For developers, you will also need to install the libelf headers in
> >> > order to build master. This dependency can be installed via:
> >> >
> >> > CentOS: elfutils-libelf-devel.x86_64
> >> > Ubuntu: libelf-dev
> >> >
> >> > Alternatively, you can install from source:
> >> > https://directory.fsf.org/wiki/Libelf
> >> >
> >> > The getting started guide and the support/docker_build.sh scripts have
> >> > been updated appropriately, but you may need to update your local
> >> > environment if you don't yet have these packages installed.
> >> >
> >> > --
> >> > ~Kevin
> >> >
> >>
> >>
> >>
> >> --
> >> Cheers,
> >>
> >> Zhitao Li
> >
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
>
>
>
> --
> ~Kevin
>


Re: Order of URIs in CommandInfo protobuf

2016-06-20 Thread Zhitao Li
Hi Robert,

I also think parallelization of fetching is important for many use cases to
reduce the time it takes to launch a task. Can we make sure the it's still
possible to parallel downloads if you file a feature request?

Also, when a task is launched, all URIs should be already fetched into
sandbox, so I'm very interested how out-of-order could break your use case.



On Mon, Jun 20, 2016 at 12:36 PM, Jie Yu  wrote:

> Robert, I just checked the code and the ordering is not guaranteed since
> we parallelize the download currently.
>
> This sounds like a feature request. Robert, do you want to create a
> ticket? For now, i think a startup script should be able to workaround that.
>
> On Mon, Jun 20, 2016 at 11:02 AM, Robert Lacroix 
> wrote:
>
>> Jie, would it hurt if we would guarantee ordering of URIs? I could see
>> use cases where the order in which files are extracted matters. Protobuf
>> preserves ordering of repeated fields, so it shouldn't be a huge effort (it
>> probably already works).
>>
>>  Robert
>>
>> On Jun 17, 2016, at 7:37 PM, Jie Yu  wrote:
>>
>> There is no ordering assumption in the API.
>>
>> - Jie
>>
>> On Fri, Jun 17, 2016 at 10:33 AM, Wil Yegelwel 
>> wrote:
>>
>>> I'm curious whether there is an ordering assumption on the CommandInfo
>>> protobuf or if the order does not matter. The comment in mesos.proto, "Any
>>> URIs specified are fetched before executing the command" seems to imply
>>> that ordering does not matter. I just wanted to confirm that was the case.
>>>
>>> Thanks,
>>> Wil
>>>
>>
>>
>>
>


-- 
Cheers,

Zhitao Li


Re: Order of URIs in CommandInfo protobuf

2016-06-20 Thread Jie Yu
Robert, I just checked the code and the ordering is not guaranteed since we
parallelize the download currently.

This sounds like a feature request. Robert, do you want to create a ticket?
For now, i think a startup script should be able to workaround that.

On Mon, Jun 20, 2016 at 11:02 AM, Robert Lacroix  wrote:

> Jie, would it hurt if we would guarantee ordering of URIs? I could see use
> cases where the order in which files are extracted matters. Protobuf
> preserves ordering of repeated fields, so it shouldn't be a huge effort (it
> probably already works).
>
>  Robert
>
> On Jun 17, 2016, at 7:37 PM, Jie Yu  wrote:
>
> There is no ordering assumption in the API.
>
> - Jie
>
> On Fri, Jun 17, 2016 at 10:33 AM, Wil Yegelwel 
> wrote:
>
>> I'm curious whether there is an ordering assumption on the CommandInfo
>> protobuf or if the order does not matter. The comment in mesos.proto, "Any
>> URIs specified are fetched before executing the command" seems to imply
>> that ordering does not matter. I just wanted to confirm that was the case.
>>
>> Thanks,
>> Wil
>>
>
>
>


Re: Executors no longer inherit environment variables from the agent

2016-06-20 Thread Jie Yu
Zhitao,

Any environment variables generated by Mesos (i.e., MESOS_, LIBPROCESS_)
> will not be affected


Yes.

 explicitly call this out in UPGRADES.md


Working on it.

- Jie

On Mon, Jun 20, 2016 at 11:43 AM, Zhitao Li  wrote:

> Hi Jie,
>
> Can you confirm that your previous response of `Any environment variables
> generated by Mesos (i.e., MESOS_, LIBPROCESS_) will not be affected.`
> will still be honored, or explicitly call this out in UPGRADES.md?
>
> Thanks.
>
> On Mon, Jun 20, 2016 at 11:39 AM, Jie Yu  wrote:
>
>> FYI, from Mesos 1.0, the executors will no longer inherit environment
>> variables from the agent by default. If you have environment environment
>> variables that you want to pass in to executors, please use `--
>> executor_environment_variables` flag on the agent.
>>
>> commit ce4b3056164a804bea52810173dbd7a418d12641
>> Author: Gilbert Song 
>> Date:   Sun Jun 19 16:01:10 2016 -0700
>>
>> Forbid the executor to inherit from slave environment.
>>
>> Review: https://reviews.apache.org/r/44498/
>>
>> - Jie
>>
>> On Tue, Mar 8, 2016 at 11:33 AM, Gilbert Song 
>> wrote:
>>
>> > Hi,
>> >
>> > TL;DR Executors will no longer inherit environment variables from the
>> agent
>> > by default in 0.30.
>> >
>> > Currently, executors are inheriting environment variables form the
>> agent in
>> > mesos containerizer by default. This is an unfortunate legacy behavior
>> and
>> > is insecure. If you do have environment variables that you want to pass
>> to
>> > the executors, you can set it explicitly by using the
>> > `--executor_environment_variables` agent flag.
>> >
>> > Starting from 0.30, we will no longer allow executors to inherit
>> > environment variables from the agent. In other words,
>> > `--executor_environment_variables` will be set to “{}” by default. If
>> you
>> > do depend on the original behavior, please set
>> > `--executor_environment_variables` flag explicitly.
>> >
>> > Let us know if you have any comments or concerns.
>> >
>> > Thanks,
>> > Gilbert
>> >
>>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>


Re: Executors no longer inherit environment variables from the agent

2016-06-20 Thread Zhitao Li
Hi Jie,

Can you confirm that your previous response of `Any environment variables
generated by Mesos (i.e., MESOS_, LIBPROCESS_) will not be affected.` will
still be honored, or explicitly call this out in UPGRADES.md?

Thanks.

On Mon, Jun 20, 2016 at 11:39 AM, Jie Yu  wrote:

> FYI, from Mesos 1.0, the executors will no longer inherit environment
> variables from the agent by default. If you have environment environment
> variables that you want to pass in to executors, please use `--
> executor_environment_variables` flag on the agent.
>
> commit ce4b3056164a804bea52810173dbd7a418d12641
> Author: Gilbert Song 
> Date:   Sun Jun 19 16:01:10 2016 -0700
>
> Forbid the executor to inherit from slave environment.
>
> Review: https://reviews.apache.org/r/44498/
>
> - Jie
>
> On Tue, Mar 8, 2016 at 11:33 AM, Gilbert Song 
> wrote:
>
> > Hi,
> >
> > TL;DR Executors will no longer inherit environment variables from the
> agent
> > by default in 0.30.
> >
> > Currently, executors are inheriting environment variables form the agent
> in
> > mesos containerizer by default. This is an unfortunate legacy behavior
> and
> > is insecure. If you do have environment variables that you want to pass
> to
> > the executors, you can set it explicitly by using the
> > `--executor_environment_variables` agent flag.
> >
> > Starting from 0.30, we will no longer allow executors to inherit
> > environment variables from the agent. In other words,
> > `--executor_environment_variables` will be set to “{}” by default. If you
> > do depend on the original behavior, please set
> > `--executor_environment_variables` flag explicitly.
> >
> > Let us know if you have any comments or concerns.
> >
> > Thanks,
> > Gilbert
> >
>



-- 
Cheers,

Zhitao Li


Re: Executors no longer inherit environment variables from the agent

2016-06-20 Thread Jie Yu
FYI, from Mesos 1.0, the executors will no longer inherit environment
variables from the agent by default. If you have environment environment
variables that you want to pass in to executors, please use `--
executor_environment_variables` flag on the agent.

commit ce4b3056164a804bea52810173dbd7a418d12641
Author: Gilbert Song 
Date:   Sun Jun 19 16:01:10 2016 -0700

Forbid the executor to inherit from slave environment.

Review: https://reviews.apache.org/r/44498/

- Jie

On Tue, Mar 8, 2016 at 11:33 AM, Gilbert Song  wrote:

> Hi,
>
> TL;DR Executors will no longer inherit environment variables from the agent
> by default in 0.30.
>
> Currently, executors are inheriting environment variables form the agent in
> mesos containerizer by default. This is an unfortunate legacy behavior and
> is insecure. If you do have environment variables that you want to pass to
> the executors, you can set it explicitly by using the
> `--executor_environment_variables` agent flag.
>
> Starting from 0.30, we will no longer allow executors to inherit
> environment variables from the agent. In other words,
> `--executor_environment_variables` will be set to “{}” by default. If you
> do depend on the original behavior, please set
> `--executor_environment_variables` flag explicitly.
>
> Let us know if you have any comments or concerns.
>
> Thanks,
> Gilbert
>


Re: Master slow to process status updates after massive killing of tasks?

2016-06-20 Thread Joseph Wu
Looks like the master's event queue is filling up, although it's difficult
to tell what exactly is doing this.  From the numbers in the gist, it's
evident that the master has seconds to minutes of backlog.

In general, there is very little processing cost associated per "accept".
The master does, however, break an "accept" into two chunk which are placed
into the master's event queue (FIFO).  The first chunk logs "Processing
ACCEPT call for offers..." and queues the second chunk.  The second chunk
logs "Launching task..." (assuming this is what the offer was accepted
for).  The greater the time gap between the two logs, the more backlogged
the master is.

I don't think there's enough info to pinpoint the bottleneck.  If you ran
this test again, here are my recommendations:

   - Set up a monitor (i.e. script that polls) for
   /master/metrics/snapshot.  Look through this doc (
   http://mesos.apache.org/documentation/latest/monitoring/) to see what
   each value means.  The most interesting metrics would match the patterns "
   master/event_queue_*" and "
   master/messages_*".
   - Try to hit /__processes__ during your test, particularly when the
   master is backlogged.  This should show the state of the various event
   queues inside Mesos.  (Keep in mind that polling this endpoint *may*
   slow down Mesos.)
   - Check if Singularity is DOS-ing the master :)

>Singularity calls reconcileTasks() every 10 minutes. How often would
>you expect to see that log line? At the worst point, we saw it printed 637
>times in one minute in the master logs.
>
   ^ This is a framework-initiated action.  Unfortunately, there are a lot
   of framework calls in the old scheduler driver that *could* be batched
   but are not due to backwards compatibility.  If Singularity tries to
   reconcile 500 tasks in a single reconcileTasks() call, using the old
   scheduler driver, it will make 500 calls to Mesos :(
   We suspect the HTTP API will have much better scaling in situations like
   this.  And it will be worthwhile to start migrating over to the new API.


On Sun, Jun 19, 2016 at 6:57 PM, Thomas Petr  wrote:

> Thanks for the quick response, Joseph! Here are some answers:
>
> The test:
> - The agents were gracefully terminated (kill -term) and were offline for
> about 10 minutes. We had plans to test other scenarios (i.e. network
> partition, kill -9, etc.) but didn't get to them yet.
> - The 1000 accidentally killed tasks were not including the tasks from the
> killed-off agents, but included replacement tasks that were started in
> response to the agent killings. I'd estimate about 400 tasks were lost from
> the killed-off agents.
> - We stopped the 5 agents at about 3:43pm, killed off the ~1000 tasks at
> 3:49pm, and then failed over the master at 4:25pm. Singularity caught wind
> of the failover at 4:27pm, reconnected, and then everything started to
> clear up after that.
> - Singularity currently does not log the Offer ID, so it's not easy for me
> to get the exact timing between Singularity accepting an offer and that
> master line you mentioned. However, I am able to get the time between
> accepting an offer and the "Launching task XXX" master line
>  --
> you can check out this info here:
> https://gist.github.com/tpetr/fe0fecbcfa0a2c8e5889b9e70c0296e7. I have a
> PR  to log the Offer ID
> in Singularity, so I'll be able to give you the exact timing the next time
> we run the test.
>
> The setup:
> - We unfortunately weren't monitoring those metrics, but will keep a close
> eye on that when we run this test again.
> - CPU usage was nominal -- CloudWatch reports less than 5% CPU utilization
> throughout the day, jumping to 10% temporarily when we failed over the
> Mesos master.
> - We run Singularity and the Mesos master together on 3 c3.2xlarges in AWS
> so there shouldn't be any bottleneck there.
> - One interesting thing I just noticed in the master logs is that the last
> "Processing ACCEPT call for offers" occurred at 3:50pm, though that could
> just be because after that time, things were so lagged that all of our
> offers timed out.
>
> Singularity:
> - Singularity still uses the default implicit acknowledgement feature of
> the scheduler driver. I filed a TODO for looking into explicit acks, but we
> do very little in the thread calling statusUpdate(). The only thing that
> could really slow things down is communication with ZooKeeper, which is a
> possibility.
> - Singularity calls reconcileTasks() every 10 minutes. How often would you
> expect to see that log line? At the worst point, we saw it printed 637
> times in one minute in the master logs.
>
> Thanks,
> Tom
>
> On Fri, Jun 17, 2016 at 6:26 PM, Joseph Wu  wrote:
>
>> A couple questions about your test:
>>
>> - By "killed off", were your agents killed permanently (i.e. 

[VOTE] Release Apache Mesos 0.26.2 (rc1)

2016-06-20 Thread Jie Yu
Hi all,

Please vote on releasing the following candidate as Apache Mesos 0.26.2.


0.26.2 is a bug fix release. It includes the following:

[MESOS-4705] - Linux 'perf' parsing logic may fail when OS distribution has
perf backports.
[MESOS-5449] - Memory leak in SchedulerProcess.declineOffer.

The CHANGELOG for the release is available at:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.26.2-rc1


The candidate for Mesos 0.26.2 release is available at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz

The tag to be voted on is 0.26.2-rc1:
https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.26.2-rc1

The MD5 checksum of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz.md5

The signature of the tarball can be found at:
https://dist.apache.org/repos/dist/dev/mesos/0.26.2-rc1/mesos-0.26.2.tar.gz.asc

The PGP key used to sign the release is here:
https://dist.apache.org/repos/dist/release/mesos/KEYS

The JAR is up in Maven in a staging repository here:
https://repository.apache.org/content/repositories/orgapachemesos-1147

Please vote on releasing this package as Apache Mesos 0.26.2!

The vote is open until Thu Jun 23 11:23:33 PDT 2016 and passes if a
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Mesos 0.26.2
[ ] -1 Do not release this package because ...

Thanks
- Jie


Re: Order of URIs in CommandInfo protobuf

2016-06-20 Thread Robert Lacroix
Jie, would it hurt if we would guarantee ordering of URIs? I could see use 
cases where the order in which files are extracted matters. Protobuf preserves 
ordering of repeated fields, so it shouldn't be a huge effort (it probably 
already works).

 Robert

> On Jun 17, 2016, at 7:37 PM, Jie Yu  wrote:
> 
> There is no ordering assumption in the API.
> 
> - Jie
> 
> On Fri, Jun 17, 2016 at 10:33 AM, Wil Yegelwel  > wrote:
> I'm curious whether there is an ordering assumption on the CommandInfo 
> protobuf or if the order does not matter. The comment in mesos.proto, "Any 
> URIs specified are fetched before executing the command" seems to imply that 
> ordering does not matter. I just wanted to confirm that was the case. 
> 
> Thanks,
> Wil
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Failed to shutdown socket with fd xxx

2016-06-20 Thread Joris Van Remoortere
>
> For "That indicates a transition from the old systemd lack of support to
> the new support. "
> >> lack of what support ? would explain more details, and how to fix this?
> or may have other cause ?


There were a few versions of Mesos where we were not yet aware of some of
the issues with running under systemd. There was a fix for the
LinuxLauncher in 0.25 (https://issues.apache.org/jira/browse/MESOS-3425)
and further fixes for the posix launcher and docker containerizer in 0.28
and some backports. See the systemd documentation at the bottom of this
page: http://mesos.apache.org/documentation/latest/agent-recovery/

It's possible that you have tasks left over from before we had this
support, which means they are not running under the executor slice. These
technically could lose their isolation (as mentioned in the warning). If
you care about the isolation (you likely do in production), then the only
remedy is to restart them.

—
*Joris Van Remoortere*
Mesosphere

On Mon, Jun 20, 2016 at 4:45 AM, Qiang Chen  wrote:

> Thanks @Haosdent for the link to explain the shutdown errors. so I can
> ignore this...
>
> @Joris,
>
> 1. I upgraded form 0.25.0 to 0.28.2 in centos 7 which  has systemd support.
> 2. I didn't make any OS / init system changes
>
> For "That indicates a transition from the old systemd lack of support to
> the new support. "
> >> lack of what support ? would explain more details, and how to fix this?
> or may have other cause ?
>
> Thanks great again!
>
>
> On 2016年06月17日 21:31, Joris Van Remoortere wrote:
>
> [image: Boxbe]  This message is eligible
> for Automatic Cleanup! (jo...@mesosphere.io) Add cleanup rule
> 
> | More info
> 
>
>
> The shutdown errors are not the issue.
> The concerning part is this warning:
>
>> W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
>> '42322' in 'mesos_executors.slice'. This can lead to lack of proper
>> resource isolation
>
> That indicates a transition from the old systemd lack of support to the
> new support.
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Jun 17, 2016 at 2:35 PM, haosdent  wrote:
>
>> Hi, @Qiang.
>>
>> @Joseph have a nice explain about at Shutdown failed on fd
>>
>> http://search-hadoop.com/m/0Vlr6pe7qb2MJX8B1=Re+Benign+Shutdown+failed+on+fd+error+messages
>> Those errors could be ignored.
>>
>> For
>> ```
>> I0615 15:01:43.324935  4172 mem.cpp:602] Started listening for OOM events
>> for container f50b4c7a-d1d2-4fc8-abb9-5ab549f168dc
>> ```
>>
>> These are normal info log, it happen when Mesos CgroupMemIsolator register
>> oom hooks for your containers.
>>
>> On Fri, Jun 17, 2016 at 8:22 PM, Joris Van Remoortere <
>> jo...@mesosphere.io>
>> wrote:
>>
>> > Can you provide:
>> > 1. The version that you are upgrading from.
>> > 2. Whether you made any OS / init system changes alongside this upgrade
>> > (just to narrow the scope).
>> >
>> > It is possible that you are upgrading from a version that did not have
>> > systemd support to one that does. If so, the upgrade may require
>> restarting
>> > the tasks (either by themselves, or just starting a fresh agent). Please
>> > check out some of the work in MESOS-3007 to get a better understanding
>> of
>> > what the issue I am referring to is.
>> >
>> > If you can verify that you are making one of these transitions from a
>> bad
>> > world to a good world, then you can devise a plan for your upgrade.
>> >
>> > Joris
>> >
>> > —
>> > *Joris Van Remoortere*
>> > Mesosphere
>> >
>> > On Fri, Jun 17, 2016 at 8:28 AM, Qiang Chen < 
>> qzsc...@gmail.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I met an issue when upgrading mesos-slave to 0.28.2.
>> > >
>> > > At the process of recovering mesos-slave / framework container stage,
>> it
>> > > produced the following errors.
>> > >
>> > >
>> > > ```
>> > > Log file created at: 2016/06/15 15:01:43
>> > > Running on machine: mesos-slave-online005-xxx.cloud.xxx.domain
>> > > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
>> > > W0615 15:01:43.285518  4182 linux_launcher.cpp:197] Couldn't find pid
>> > > '42322' in 'mesos_executors.slice'. This can lead to lack of proper
>> > > resource isolation
>> > > W0615 15:01:43.286182  4182 linux_launcher.cpp:197] Couldn't find pid
>> > > '42312' in 'mesos_executors.slice'. This can lead to lack of proper
>> > >