Re: Fwd: [Breaking Change 0.24 & Upgrade path] ZooKeeper MasterInfo change.

2015-09-26 Thread CCAAT

On 09/25/2015 04:34 PM, Raúl Gutiérrez Segalés wrote:


On Sep 25, 2015 9:08 AM, "Marco Massenzio" > wrote:
 >
 > +1 to what Alex says.
 >
 > As far as we know, the functionality we use (ephemeral sequential
nodes and writing simple data to a znode) is part of the "base API"
offered by ZooKeeper and every version would support it.
 > (then again, not a ZK expert here - if anyone knows better, please
feel free to correct me).
 >

Yup, not of that is changing in the upcoming ZooKeeper releases (3.4.7 &
3.5.2) and probably never will.

-rgs




That simplifies the related dependencies for zookeeper.


Thanks,
James


Re: Official RPMs

2015-09-26 Thread CCAAT

On 09/25/2015 07:36 PM, Marco Massenzio wrote:

Yes, the plan is definitely to make the tooling available to the
project: there is nothing "secret" about it - at the moment,
unfortunately, it relies on a bit of internal infrastructure and, well,
yesss, it's a bit too crafty to be ready for "external consumption"
but we're working on it!


Hello Marco,

I' packaging up for gentoo linux. Just the itemized list of 
what/where/when you setup config files and such would be a keen help to 
my efforts. I can substitute in gentoo-centric tools, if you can provide 
a brief description on what your infra/tools are doing.

A skeletal spec, if you like.


James






/Marco Massenzio/
/Distributed Systems Engineer
http://codetrips.com/

On Fri, Sep 25, 2015 at 11:33 AM, Zameer Manji <zma...@apache.org
<mailto:zma...@apache.org>> wrote:

Could mesosphere donate their tooling for packaging mesos to the
project? This way any project member or contributor can build
packages and it can be apart of the release process.

On Fri, Sep 25, 2015 at 10:53 AM, Artem Harutyunyan
<ar...@mesosphere.io <mailto:ar...@mesosphere.io>> wrote:

The repositories have been updated yesterday, and the downloads page
was updated today. Mesos 0.24 packages are now available at
https://mesosphere.com/downloads/. Thank you very much for your
patience!

Cheers,
Artem.

On Tue, Sep 22, 2015 at 11:02 AM, Marco Massenzio
<ma...@mesosphere.io <mailto:ma...@mesosphere.io>> wrote:
 > Hi guys,
 >
 > just wanted to let you all know that we (Mesosphere) fully
intend to
 > continue supporting distributing binary packages for the
current set of
 > supported OSes (namely, Ubuntu / Debian / RedHat / CentOS as
listed in [0]).
 >
 > Sorry that 0.24 slipped through the cracks, the person who
actually takes
 > care of that and knows the magic incantations has been
unwell, and a number
 > of other competing priorities got in the way - we will
eventually be
 > automating the process, so that downloadable binary packages
are created out
 > of each release/RC build (and, possibly, even more often)
without pesky
 > humans getting in the way :) but this may take some time.
 > We're building the 0.24 ones as we speak, so please bear with
us while this
 > gets done.
 >
 > Any questions / suggestions, we'd love to hear those too!
 >
 > [0] https://mesosphere.com/downloads/
 >
 > Marco Massenzio
 > Distributed Systems Engineer
 > http://codetrips.com
 >
 > On Tue, Sep 22, 2015 at 10:54 AM, CCAAT
<cc...@tampabay.rr.com <mailto:cc...@tampabay.rr.com>> wrote:
 >>
 >> On 09/21/2015 03:01 PM, Vinod Kone wrote:
 >>>
 >>> +Jake Farrell
 >>>
 >>> The mesos project doesn't publish platform dependent
artifacts.  We
 >>> currently only publish platform independent artifacts like
JAR (to
 >>> apache maven) and interface EGG (to PyPI).
 >>>
 >>> Recently we made the decision
 >>>
<http://www.mail-archive.com/dev%40mesos.apache.org/msg33148.html>
for
 >>> the project to not officially support different language
(java, python)
 >>> framework libraries going forward (likely after 1.0). The
project will
 >>> only support C++ libraries which will live in the repo and
link to other
 >>> language libraries from our website.
 >>>
 >>> The main reason was that the PMC lacks the expertise to
support various
 >>> language bindings and hence we wanted to remove the support
burden.
 >>>
 >>> Option #1) It looks like we could do a similar thing with
RPMs/DEBs,
 >>> i.e., link to 3rd party artifacts from the project website.
Similar to
 >>> the client library authors, we could hold package maintainers
 >>> accountable by providing guidelines.
 >>>
 >>> Option #2) Since the project officially supports certain
platforms
 >>> (Ubuntu, CentOS, OSX) and continuously tests this via CI,
we could've
 >>> the CI continuously build and upload the packages. Not sure
what's ASF
 >>> stance on this is. I filed a ticket
 >>> 

Re: Fwd: [Breaking Change 0.24 & Upgrade path] ZooKeeper MasterInfo change.

2015-09-25 Thread CCAAT

On 09/25/2015 08:13 AM, Marco Massenzio wrote:

Folks:

as a reminder, please be aware that as of Mesos 0.24.0, as announced
back in June, Mesos Master will write its information (`MasterInfo`) to
ZooKeeper in JSON format (see below for details).



What versions of Zookeeper are supported by this change? That is, what
is the oldest version of Zookeeper known to work or not work with this
change in Mesos?


James






If your framework relied on parsing the info (either de-serializing the
Protocol Buffer or just looking for an "IP-like" string) this change
will be a breaking change.

Just to confirm (see also Vinod's comments below) any rolling upgrades
(i.e., clusters with 0.22+0.23 and 0.23+0.24) of Mesos will just work.

This was in conjunction with the HTTP API release and removing the need
for non-C++ developers to have to link with libmesos and have to deal
with Protocol Buffers.

An example of how to access the new format in Python can be found in [0]
and we're happy to help with other languages too.
Any questions, please just ask.

[0] http://github.com/massenz/zk-mesos

Marco Massenzio
/Distributed Systems Engineer
http://codetrips.com/

-- Forwarded message --
From: *Vinod Kone* >
Date: Wed, Jun 24, 2015 at 4:17 PM
Subject: Re: [Breaking Change 0.24 & Upgrade path] ZooKeeper MasterInfo
change.
To: dev >


Just to clarify, any frameworks that are using the Mesos provided bindings
(aka libmesos.so) should not worry, as long as the version of the bindings
and version of the mesos master are not separated by more than 1 version.
In other words, you should be able to live upgrade a cluster from 0.23.0 to
0.24.0.

For framework schedulers that don't use the bindings (pesos, jesos etc), it
is prudent to add support for JSON formatted ZNODE to their master
detection code.

Thanks,

On Wed, Jun 24, 2015 at 4:10 PM, Marco Massenzio >
wrote:


Folks,

as heads-up, we are planning to convert the format of the MasterInfo
information stored in ZooKeeper from the Protocol Buffer binary format to
JSON - this is in conjunction with the HTTP API development, to allow
frameworks *not* to depend on libmesos and other binary dependencies to
interact with Mesos Master nodes.


 > *NOTE* - there is no change in 0.23 (so any Master/Slave/Framework
that is
 > currently working in 0.22 *will continue to work* in 0.23 too) but as of

Mesos 0.24, frameworks and other clients relying on the binary format will
break.

The details of the design are in this Google Doc:

https://docs.google.com/document/d/1i2pWJaIjnFYhuR-000NG-AC1rFKKrRh3Wn47Y2G6lRE/edit

the actual work is detailed in MESOS-2340:
https://issues.apache.org/jira/browse/MESOS-2340

and the patch (and associated test) are here:
https://reviews.apache.org/r/35571/
https://reviews.apache.org/r/35815/


 > *Marco Massenzio*
 > *Distributed Systems Engineer*
 >





Re: Official RPMs

2015-09-22 Thread CCAAT

On 09/21/2015 03:01 PM, Vinod Kone wrote:

+Jake Farrell

The mesos project doesn't publish platform dependent artifacts.  We
currently only publish platform independent artifacts like JAR (to
apache maven) and interface EGG (to PyPI).

Recently we made the decision
 for
the project to not officially support different language (java, python)
framework libraries going forward (likely after 1.0). The project will
only support C++ libraries which will live in the repo and link to other
language libraries from our website.

The main reason was that the PMC lacks the expertise to support various
language bindings and hence we wanted to remove the support burden.

Option #1) It looks like we could do a similar thing with RPMs/DEBs,
i.e., link to 3rd party artifacts from the project website. Similar to
the client library authors, we could hold package maintainers
accountable by providing guidelines.

Option #2) Since the project officially supports certain platforms
(Ubuntu, CentOS, OSX) and continuously tests this via CI, we could've
the CI continuously build and upload the packages. Not sure what's ASF
stance on this is. I filed a ticket
 a while ago with
INFRA regarding something similar, but never received any response.

Personally, with the direction the project is headed towards, I prefer #1.


+1 (Option #1)

This 'Option #1' approach will require the core dev team to clearly 
convey what is needed for any OS supported, not the chosen OSes for 
support. Right now, I'm having to parse many documents to figure out how 
to extend the gentoo ebuild for mesos. And where to cut off what I do in 
the ebuilds and what to put into the configuration documents for gentoo. 
Naturally the minimial is only what should be in the the gentoo ebuild; 
with other items, such as HDFS as a compiler option. Once I get the 
btrfs/ceph work stabilized, there will be a compile time option for 
btrfs/ceph with the gentoo ebuild. Other distros that are not going that

way should have other Distributed File System options 'baked into' their
installation on that OS.



'Option #1' sets the stage for many OSes to be supported and the core 
dev team only has to support  a single document to clarify what any 
distro needs to robustly support mesos for their user community. This 
will facilitate a wider variety of experimentation, at the companion 
repos too. This  Option #1 approach will further accelerate adoption of 
Mesos on a very wide variety of platforms and architectures, imho. It 
sets the stage for valid benchmark performance comparison between 
distros; something that the gentoo community will no doubt win


;-) 

James






On Sat, Sep 19, 2015 at 3:39 AM, Carlos Sanchez > wrote:

I'm using the same repo with some changes to build SSL enabled packages


https://github.com/carlossg/mesos-deb-packaging/compare/master...carlossg:ssl


On Sat, Sep 19, 2015 at 4:22 AM, Rad Gruchalski
> wrote:
 > Should be rather easy to package it with this little tool from
Mesosphere:
 > https://github.com/mesosphere/mesos-deb-packaging. I’ve done it
myself for
 > ubuntu 12.04 and 14.04.
 > The only thing that needs to be changed are the dependencies, for
ubuntu
 > this was:
 >
 > diff --git a/build_mesos b/build_mesos
 > index 81561bc..f756ef0 100755
 > --- a/build_mesos
 > +++ b/build_mesos
 > @@ -313,9 +313,10 @@ function deb_ {
 > --deb-recommends zookeeperd
 > --deb-recommends zookeeper-bin
 > -d 'java-runtime-headless'
 > -   -d libcurl3
 > -   -d libsvn1
 > -   -d libsasl2-modules
 > +   -d libcurl4-nss-dev
 > +   -d libsasl2-dev
 > +   -d libapr1-dev
 > +   -d libsvn-dev
 >
 > It does look like the tool can build RPMs.
 >
 > Kind regards,
 > Radek Gruchalski
 > ra...@gruchalski.com 
 > de.linkedin.com/in/radgruchalski/

 >
 > Confidentiality:
 > This communication is intended for the above-named person and may be
 > confidential and/or legally privileged.
 > If it has come to you in error you must take no action based on
it, nor must
 > you copy or show it to anyone; please delete/destroy and inform
the sender
 > immediately.
 >
 > On Saturday, 19 September 2015 at 04:09, craig w wrote:
 >
 > Mesosphere provides packages, you can find more information here:
 > https://mesosphere.com/downloads/
 >
 > As of right now, they don't seem to have a 0.24.0 package.
 >
 > On Fri, Sep 18, 2015 at 8:51 PM, Brian Hicks
  

mesos-0.24

2015-09-21 Thread CCAAT

Hello,


So I'm working on putting together the mesos-0.24 ebuild for gentoo,
from sources. Since from the tarball, 
/usr/portage/distfiles/mesos-0.24.1.tar.gz, is the file pulled down for 
mesos. I guess it is actually mesos-0-.24.1. I have it compiling and it 
installs these now in /usr/bin/::


mesos  mesos-execute  mesos-log  mesos-resolve  mesos-tail
mesos-cat  mesos-localmesos-ps   mesos-scp


So following this guide:: http://mesos.apache.org/gettingstarted/

I have to adjust the names slightly::


# mesos start-masters.sh
Failed to find /etc/mesos/masters

# mesos start-cluster.sh
Failed to find /etc/mesos/masters

So obviously, I have to generate and write out default configuration files.

Hopefully I won't have to do this because somewhere there is a list
of what you (generically) have to do to get a default install of 
mesos-0.24.1 running. A minimal list or an example to follow with mesos 
+ zookeeper for now? I can the extend the mesos-0.24.1.ebuild to perform

some of the baseline configuration files and such, as need.


Any suggestions are most welcome, so I can expedite the install of
mesos on gentoo.


James




Re: Building portable binaries

2015-09-20 Thread CCAAT
 team of arm64 devs, a veritable who's who,
working bring gentoo to arm64. Others like myself are more focused on 
the bare metal approach that works well for clusters, distributed 
embedded systems, rock  solid security and superior performance per watt 
of power consumed.



You are most welcome to join us. Have you ever installed gentoo, on a 
actual system?  Right now, the pedantic approach dominants at Gentoo

via the arduous 'gentoo handbook'. I'd strongly suggest you endure
that pain, to become functionally literate with Gentoo. Several folks 
are working on rapid install semantics for Gentoo on a myriad for 
hardware architectures.



wwr,
James




Cheers!

On 20/09/2015 4:35 AM, CCAAT wrote:

Hey F21,

The ebuild is attached. Hopefully I'll be setting up github for this
and other ebuilds I have been hacking on. I'm not a dev, but debugging
these ebuilds in not difficult [1].


Please let me know how this ebuild works for you. I never tested it
extensively


James




On 09/17/2015 11:09 PM, F21 wrote:

That sounds really interesting! I am just in the process of spinning up
a gentoo vm.

Would you mind sharing your ebuild for mesos-0.22.0 via a gist on
Github?

On 18/09/2015 12:58 PM, CCAAT wrote:

On 09/17/2015 06:33 PM, F21 wrote:

Is there anyway to build portable binaries for mesos?



You should try out gentoo linux, everything is built from sources.

Ebuilds guide the process. My (hack) ebuild for mesos-0.22.0 was
61 lines. That's it. I will roll out a 0.24 ebuild, in a few weeks
or less.

Gentoo is designed from the ground up to build form sources. We have
a rich 'cross-compile' environment for things like aarch64; so building
mesos for arm64 is mostly trivial, once the 0.24 ebuild is rolled out.


There a bit of reading, but the gentoo 'devmanual' pretty much guides
you through the process [1]. Gentoo also has a great package manager.
Here is a (very profane) rant/comparison of some common package
managers
and their inherent weaknesses [2]. If you want to see how simple the
gentoo ebuild for mesos-0.22 is just ask. It fetches, unpacks, compiles
and installs the package, very neatly. And there is lots of help and
encouragement from a long list of talented devs.

Gentoo is not for the weak minded or folks that do not wish to master
the deep details of linux. caveat emptor. CoreOS uses much of gentoo
in it's build/management, if that option helps.


hth,
James





[1] https://devmanual.gentoo.org/


[2]
http://michael.orlitzky.com/articles/motherfuckers_need_package_management.php














Re: Building portable binaries

2015-09-19 Thread CCAAT

Hey F21,

The ebuild is attached. Hopefully I'll be setting up github for this and 
other ebuilds I have been hacking on. I'm not a dev, but debugging these 
ebuilds in not difficult [1].



Please let me know how this ebuild works for you. I never tested it 
extensively



James




On 09/17/2015 11:09 PM, F21 wrote:

That sounds really interesting! I am just in the process of spinning up
a gentoo vm.

Would you mind sharing your ebuild for mesos-0.22.0 via a gist on Github?

On 18/09/2015 12:58 PM, CCAAT wrote:

On 09/17/2015 06:33 PM, F21 wrote:

Is there anyway to build portable binaries for mesos?



You should try out gentoo linux, everything is built from sources.

Ebuilds guide the process. My (hack) ebuild for mesos-0.22.0 was
61 lines. That's it. I will roll out a 0.24 ebuild, in a few weeks
or less.

Gentoo is designed from the ground up to build form sources. We have
a rich 'cross-compile' environment for things like aarch64; so building
mesos for arm64 is mostly trivial, once the 0.24 ebuild is rolled out.


There a bit of reading, but the gentoo 'devmanual' pretty much guides
you through the process [1]. Gentoo also has a great package manager.
Here is a (very profane) rant/comparison of some common package managers
and their inherent weaknesses [2]. If you want to see how simple the
gentoo ebuild for mesos-0.22 is just ask. It fetches, unpacks, compiles
and installs the package, very neatly. And there is lots of help and
encouragement from a long list of talented devs.

Gentoo is not for the weak minded or folks that do not wish to master
the deep details of linux. caveat emptor. CoreOS uses much of gentoo
in it's build/management, if that option helps.


hth,
James





[1] https://devmanual.gentoo.org/


[2]
http://michael.orlitzky.com/articles/motherfuckers_need_package_management.php






# Copyright 1999-2014 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: $
## hack by James aug 2014

EAPI=5

inherit autotools eutils flag-o-matic multilib

MY_PV=${PV/_/}

DESCRIPTION="fast cluster manager for distributed applications"
HOMEPAGE="http://mesos.apache.org/;

SRC_URI="http://apache.org/dist/${PN}/${PV}/${P}.tar.gz;

LICENSE="Apache-2.0"
KEYWORDS="~amd64"
IUSE="java python"
SLOT="0"

DEPEND="net-misc/curl
dev-libs/cyrus-sasl
python? ( dev-lang/python dev-python/boto )
java? ( virtual/jdk )
dev-java/maven-bin 
dev-libs/hyperleveldb
dev-python/pip
dev-python/wheel
dev-vcs/subversion"

RDEPEND="python? ( dev-lang/python )
 >=virtual/jdk-1.6
 dev-java/maven-bin 
 ${DEPEND}"


S="${WORKDIR}/${P}"

ECONF_SOURCE="${S}"

src_prepare() {
mkdir "${S}/build" || die
}

src_configure() {
cd "${S}/build"
econf \
$(use_enable python) \
$(use_enable java)
}

src_compile() {
cd "${S}/build"
emake -j1 V=1
}

src_install() {
cd "${S}/build"
emake DESTDIR="${D}" install || die "emake install failed"
}


Re: mesos 0.24 released?

2015-09-18 Thread CCAAT

On 09/18/2015 01:33 PM, Vinod Kone wrote:


On Fri, Sep 18, 2015 at 11:31 AM, craig w > wrote:

Gotcha will there be a blog post / release announcement on the
website soon?


yea i'll get to it. sorry for the delay.



I'm confused. Here at gentoo, we usually (almost always) follow the
release numbers including the full minor release number designations
of the tarball. So if I download a tarball, it's not references by minor
release number?

Here is a simple, quick example::

www-client/seamonkeyAvailable versions:  2.33.1-r1 2.35

This accuracy is quintessentially necessary for the gentoo community.
Do I have to fix the name of the tarballs manually?  In gentoo ebuilds,
the files that download, the tarballs directly from a source repo
the exact and complete version number is necessary.   Should
I use github?

What are recommendations for this?

Here is what I'm using for mesos' current ebuild::

SRC_URI="http://apache.org/dist/${PN}/${PV}/${P}.tar.gz;


curiously,
James




Re: mesos 0.24 released?

2015-09-18 Thread CCAAT


Oh,

Here is a link that explains the Variable meanings for the packages
downloaded by gentoo's package manager, portage::

https://devmanual.gentoo.org/ebuild-writing/variables/

I really am at the stage that I want/need to test many tarball releases
and also to start testing on other architectures, like aarch64.
Please advise on the location(s) of these tarballs, even if they
are not formal releases or release candidates.

Is anyone testing tarball compilations on other arches?
Do I need to start reading the dev list for this sort of granularity?
All suggestions are most welcome


curiously,
James





I'm confused. Here at gentoo, we usually (almost always) follow the
release numbers including the full minor release number designations
of the tarball. So if I download a tarball, it's not references by minor
release number?

Here is a simple, quick example::

www-client/seamonkeyAvailable versions:  2.33.1-r1 2.35

This accuracy is quintessentially necessary for the gentoo community.
Do I have to fix the name of the tarballs manually?  In gentoo ebuilds,
the files that download, the tarballs directly from a source repo
the exact and complete version number is necessary.   Should
I use github?

What are recommendations for this?

Here is what I'm using for mesos' current ebuild::

SRC_URI="http://apache.org/dist/${PN}/${PV}/${P}.tar.gz;


curiously,
James







Re: Building portable binaries

2015-09-17 Thread CCAAT

First, here is a very cool gentoo vm with btrfs in a raid-one config.

https://docs.google.com/document/d/1VJlJyYLTZScta9a81xgKOIBjYsG3_VfxxmUSxG23Uxg/edit?pli=1

I'll clean up the ebuild and post tomorrow. I got an old spark and 
zookeeper is floating around. I got sidetracked into working on

a rapid install semantic for gentoo


Tomorrow.

James


On 09/17/2015 11:09 PM, F21 wrote:

That sounds really interesting! I am just in the process of spinning up
a gentoo vm.

Would you mind sharing your ebuild for mesos-0.22.0 via a gist on Github?

On 18/09/2015 12:58 PM, CCAAT wrote:

On 09/17/2015 06:33 PM, F21 wrote:

Is there anyway to build portable binaries for mesos?



You should try out gentoo linux, everything is built from sources.

Ebuilds guide the process. My (hack) ebuild for mesos-0.22.0 was
61 lines. That's it. I will roll out a 0.24 ebuild, in a few weeks
or less.

Gentoo is designed from the ground up to build form sources. We have
a rich 'cross-compile' environment for things like aarch64; so building
mesos for arm64 is mostly trivial, once the 0.24 ebuild is rolled out.


There a bit of reading, but the gentoo 'devmanual' pretty much guides
you through the process [1]. Gentoo also has a great package manager.
Here is a (very profane) rant/comparison of some common package managers
and their inherent weaknesses [2]. If you want to see how simple the
gentoo ebuild for mesos-0.22 is just ask. It fetches, unpacks, compiles
and installs the package, very neatly. And there is lots of help and
encouragement from a long list of talented devs.

Gentoo is not for the weak minded or folks that do not wish to master
the deep details of linux. caveat emptor. CoreOS uses much of gentoo
in it's build/management, if that option helps.


hth,
James





[1] https://devmanual.gentoo.org/


[2]
http://michael.orlitzky.com/articles/motherfuckers_need_package_management.php








Re: API client libraries

2015-09-02 Thread CCAAT
@ Vinod:: An excellent idea as the code bases mature. It will force 
clear delineation of functionality and allow those 'other language" 
experts to define their codes for Mesos more clearly.


@ Artem:: Another excellent point. The mesos "core team" will have to 
still work with the other language/module teams to define things and
debug some codes that use core interfaces, API and common 
inter-operative constructs.



Furthermore this sort of code maturity will set the stage for other 
languages to bring enhanced functionality to Mesos.



Last, Separating the C/C++ will facilitate those efforts to run mesos
as close as possible to 'bare metal' on a variety of processors, gpus
and memory-types (RDMA) which are all available now with GCC-5.x This
effort will most like result in tremendous performance boosting of Mesos
and all the companion codes.


A smashingly outstanding idea


James



On 09/02/2015 02:01 PM, Artem Harutyunyan wrote:

Thanks for bringing this up, Vinod!

We have to make sure that there are reference library implementations
for at least Python, Java, and Go. They may end up being owned and
maintained by the community, but I feel that Mesos developers should at
least kickstart the process and incubate those libraries. Once the
initial implementations of those libraries are in place we should also
make sure to have reference usage examples for them (like we do right
now with Rendler).

In any case, this is a very important topic so I will go ahead and add
it to tomorrow's community sync agenda.

Cheers,
Artem.

On Wed, Sep 2, 2015 at 11:49 AM, Vinod Kone > wrote:

Hi folks,

Now that the v1 scheduler HTTP API (beta) is on the verge of being
released, I wanted to open up the discussion about client libraries
for the
API. Mainly around support and home for the libs.

One idea is that, going forward, the only supported client library
would be
C++ library which will live in the mesos repo. All other client
libraries
(java, python, go etc) will not be officially supported but linked
to from
our webpage/docs.

Pros:
--> The PMC/committers won't have the burden to maintain client
libraries
in languages we don't have expertise in.
--> Gives more control (reviews, releases) and attribution (could
live in
the author's org's or personal repo) to 3rd party client library authors

Cons:
--> Might be a step backward because we would be officially dropping
support for Java and Python. This is probably a good thing?
--> No quality control of the libraries by the PMC. Need
co-ordination with
library authors to incorporate API changes. Could lead to bad user
experience.

I've taken a quick look at what other major projects do and it looks
like
most of them officially support a few api libs and then link to 3rdparty
libs.

Docker

:
No official library? Links to 3rd party libs.

GitHub : Official support for
Ruby, .Net, Obj-C. Links to 3rd party libs.

Google : All
official libraries? No links to 3rd party libs?

K8S :
Official
support for Go. Links to 3rd party libs.

Twitter :
Official
support for Java. Links to 3rd party libs.


Is this the way we want to go? This does mean we won't need a
mesos/commons
repo because the project would be not be officially supporting 3rd party
libs. The supported C++ libs will live in the mesos repo.

Thoughts?






Re: Not getting resource offers for 20 min

2015-08-25 Thread CCAAT

THANKS, as I have not kept up on the spark lists

James


On 08/25/2015 04:28 AM, Iulian Dragoș wrote:



On Mon, Aug 24, 2015 at 7:16 PM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

On 08/24/2015 05:33 AM, Iulian Dragoș wrote:


Hello Iulian,

Ok, so I eventually build spark from 100% sources, after some
intermediate builds on gentoo.   Gentoo is not the best platform for
Java development, but those issues related to spark builds are
slowly being fixed on gentoo. Where (how) do you download the
spark-1.5.x complete source tree, as it does not seen available on
this page::

http://spark.apache.org/downloads.html


It's not yet a final release, but there's a preview:

http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-ANNOUNCE-Spark-1-5-0-preview-package-td13683.html

Building Spark from sources isn't too hard, there's a
`make-distribution.sh` script in the root directory. There are a few
parameters (like the dependency Hadoop version), but it should be fairly
straight forward. More info here:

http://spark.apache.org/docs/latest/building-spark.html

iulian



Any other related information or tips on building out spark from sources
are keenly received.

James

Unfortunately I don't have access to the cluster anymore, but I
think
Chronos wasn't the culprit. After updating Spark to 1.5 and
setting a
framework role offers started to come (while still using Chronos).

iulian


Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com http://www.typesafe.com http://www.typesafe.com





--

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com http://www.typesafe.com





Mesos release schedules

2015-08-05 Thread CCAAT

Hello,

Looking here::  [1]  It seems we have a very aggressive (tentative) 
release schedule for mesos-1.0 ?


Anyone care to approximate (WAG wild_ax_guess) a date for mesos-1.0?


Or will there be other versions after 0.25.0 of mesos?

mesos-0.24 just shows one  bug (51/52) as unresolved.


Jira seems to suggest tons of work must be completed before mesos-1.0 is 
release? [2]   Any comments are welcome.



James


[1] 
https://issues.apache.org/jira/browse/mesos/?selectedTab=com.atlassian.jira.jira-projects-plugin:roadmap-panel



[2] 
https://issues.apache.org/jira/browse/MESOS/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel


Re: Mesos release schedules

2015-08-05 Thread CCAAT

On 08/05/2015 01:38 PM, Vinod Kone wrote:

Hi James,

The next release of Mesos is going to be 0.24.0. The 1.0 release might
be 0.25.0 or 0.26.0 depending on the progress we make with the HTTP API.


Good to know. Either way, it looks like a ~fall 2015 for a 1.0 release 
of Mesos?



Because we use Fix Version and Target Version in a customized way,
the link you sent doesn't capture the blocking issues. I'll update the
release tracking ticket
https://issues.apache.org/jira/browse/MESOS-2562 with more accurate
info shortly.


The link I used now shows 52/52 so it at least auto-updated, even if the 
detail is lacking. Still, your link is better


Thanks,
James




On Wed, Aug 5, 2015 at 9:41 AM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

Hello,

Looking here::  [1]  It seems we have a very aggressive (tentative)
release schedule for mesos-1.0 ?

Anyone care to approximate (WAG wild_ax_guess) a date for mesos-1.0?


Or will there be other versions after 0.25.0 of mesos?

mesos-0.24 just shows one  bug (51/52) as unresolved.


Jira seems to suggest tons of work must be completed before
mesos-1.0 is release? [2]   Any comments are welcome.


James


[1]

https://issues.apache.org/jira/browse/mesos/?selectedTab=com.atlassian.jira.jira-projects-plugin:roadmap-panel


[2]

https://issues.apache.org/jira/browse/MESOS/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel






Re: Reusing Task IDs

2015-07-11 Thread CCAAT

I'd be most curious to see a working example of this idea, prefixes
and all for sleeping (long term sleeping) nodes (slave and masters).

Anybody, do post what you have/are doing on this taskid resuse and 
reservations experimentations. Probably many are interested for a 
variety of reasons including but not limited to security, auditing  and 
node diversification interests My interests are in self-modifying
codes, which can be achieved whilst the nodes sleep for some very 
interesting applications.



James



On 07/11/2015 06:01 AM, Adam Bordelon wrote:

Reusing taskIds may work if you're guaranteed to never be running two
instances of the same taskId simultaneously, but I could imagine a
particularly dangerous scenario where a master and slave experience a
network partition, so the master declares the slave lost and therefore
its tasks lost, and then the framework scheduler launches a new task
with the same taskId. However, the task is still running on the original
slave. When the slave reregisters and claims it is running that taskId,
or that that taskId has completed, the Mesos master may have a difficult
time reconciling which instance of the task is on which node and in
which status, since it expects only one instance to exist at a time.
You may be better off using a fixed taskId prefix and appending an
incrementing instance/trial number so that each run gets a uniqueId.
Also note that taskIds only need to be unique within a single
frameworkId, so don't worry about conflicting with other frameworks.
TL;DR: I wouldn't recommend it.

On Fri, Jul 10, 2015 at 10:20 AM, Antonio Fernández
antonio.fernan...@bq.com mailto:antonio.fernan...@bq.com wrote:

Sounds risky. Every task should have its own unique id, collisions
could happen and unexpected issues.

I think it will be as hard to monitor that you can start again a
task than get a mechanism to know it’s ID.




On 10 Jul 2015, at 19:14, Jie Yu yujie@gmail.com
mailto:yujie@gmail.com wrote:

Re-using Task IDs is definitely not encouraged. As far as I know,
many of the Mesos code assume Task ID is unique. So I probably
won't risk that.


On Fri, Jul 10, 2015 at 10:06 AM, Sargun Dhillon sar...@sargun.me
mailto:sar...@sargun.me wrote:

Is reusing Task IDs good behaviour? Let's say that I have some
singleton task - I'll call it a monitoring service. It's
always going
to be the same process, doing the same thing, and there will
only ever
be one around (per instance of a framework). Reading the
protobuf doc,
I learned this:


/**
 * A framework generated ID to distinguish a task. The ID must
remain
 * unique while the task is active. However, a framework can
reuse an
 * ID _only_ if a previous task with the same ID has reached a
 * terminal state (e.g., TASK_FINISHED, TASK_LOST,
TASK_KILLED, etc.).
 */
message TaskID {
  required string value = 1;
}
---
Which makes me think that it's reasonable to just give this
task the
same taskID, and that every time I bring it from a terminal
status to
running once more, I can reuse the same ID. This also gives me the
benefit of being able to more easily locate the task for a given
framework, and I'm able to exploit Mesos for some weak guarantees
saying there wont be multiple of these running (don't worry,
they lock
in Zookeeper, and concurrent runs don't do anything, they just
fail).

Opinions?





^^Nos encantan los árboles. No me imprimas si no es necesario.

Protección de Datos:Mundo Reader S.L. le informa de que los datos
personales facilitados por Ud. y utilizados para el envío de esta
comunicación serán objeto de tratamiento automatizado o no en
nuestros ficheros, con la finalidad de gestionar la agenda de
contactos de nuestra empresa y para el envío de comunicaciones
profesionales por cualquier medio electrónico o no. Puede consultar
en www.bq.com http://www.bq.com/los detalles de nuestra Política
de Privacidad y dónde ejercer el derecho de acceso, rectificación,
cancelación y oposición.

Confidencialidad:Este mensaje contiene material confidencial y está
dirigido exclusivamente a su destinatario. Cualquier revisión,
modificación o distribución por otras personas, así como su reenvío
sin el consentimiento expreso está estrictamente prohibido. Si usted
no es el destinatario del mensaje, por favor, comuníqueselo al
emisor y borre todas las copias de forma inmediata.
Confidentiality:This e-mail contains material that is confidential
for de sole use of de intended recipient. Any review, reliance or
distribution by others or forwarding without express permission is
strictly prohibited. If you are not the intended recipient, 

Re: Multi-mastersD

2015-07-07 Thread CCAAT

I'm glad to know it is easy, that's what I was hoping for.


I want to keep the (3+) masters on line 7/24/365 but have different 
teams of slave that do different (industrial) tasks. Each team would 
be geographically close, if not on the same power buss. I would think
this is routine, but I have not tried it yet. Sure, the number of 
masters will expand as needed, but one pool of masters. Many many pools 
of mesos slaves with various abilities, in diverse if not extremely 
remote locations.



So it's been done? Experiences?  Many of these 'slave processor teams' 
will sleep for significant periods, if that matters. Think of it as a 
very distributed cluster with very diversified hardware and task 
requests que. Rarely working on a single BIG problem

but still with that Big problem, one team capability.


Any suggestions for long term sleep issues of slaves? Upgrade scheduling 
? Data consistency once a team is awakened?



James



On 07/07/2015 10:08 AM, Marco Massenzio wrote:

(I'm sure I'm missing something here, so please forgive if I'm stating
the obvious)

This is actually very well supported right now: you can use slave
attributes (if, eg, you want to name the various clusters differently
and launch tasks according to those criteria) that would be passed on to
the Frameworks along with the resource offers: the frameworks could then
decide whether to accept the offer and launch tasks based on whatever
logic you want to implement.

You could use something like --attributes=cluster:01z99;
os:ubuntu-14-04; jdk:8 or whatever makes sense.

/Marco Massenzio/
/Distributed Systems Engineer/

On Tue, Jul 7, 2015 at 8:55 AM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

Hello team_mesos,

Is there any reason one set of (3) masters cannot talk to and manage
several (many) different slave clusters of (3)? These slave clusters
would be different arch, different mixes of resources and be running
different frameworks, but all share/use the same (3) masters.


Ideas on how to architect this experiment, would be keenly appreciated.


James






Multi-mastersD

2015-07-07 Thread CCAAT

Hello team_mesos,

Is there any reason one set of (3) masters cannot talk to and manage
several (many) different slave clusters of (3)? These slave clusters
would be different arch, different mixes of resources and be running
different frameworks, but all share/use the same (3) masters.


Ideas on how to architect this experiment, would be keenly appreciated.


James



Re: Running storm over mesos

2015-07-03 Thread CCAAT

On 07/03/2015 12:30 PM, Tim Chen wrote:

Hi Pradeep,

Without any more information it's quite impossible to know what's going on.
What's in the slave logs and storm framework logs?
Tim

On Fri, Jul 3, 2015 at 10:06 AM, Pradeep Chhetri
pradeep.chhetr...@gmail.com mailto:pradeep.chhetr...@gmail.com wrote:

Hello all,

I am trying to run Storm over Mesos using the tutorial
(http://open.mesosphere.com/tutorials/run-storm-on-mesos) over
vagrant. When I am trying to submit a sample topology, it is not
spawning any storm supervisors over the mesos-slaves. I didn't find
anything interesting in the logs as well. Can someone help in
figuring out the problem.
Pradeep Chhetri


Sometimes it helps to just read about some of the various ways to use 
storm. Here are some links for reading at what others have done::



http://tutorials.github.io/pages/creating-a-production-storm-cluster.html?ts=1340499018#.VM67mz5VHxg

https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html


And of coarse this reference, just to be complete.
https://storm.canonical.com/


Re: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and Gatling

2015-07-02 Thread CCAAT

On 07/02/2015 12:10 PM, Carlos Torres wrote:


From: CCAAT cc...@tampabay.rr.com
Sent: Thursday, July 2, 2015 12:00 PM
To: user@mesos.apache.org
Cc: cc...@tampabay.rr.com
Subject: COMMERCIAL:Re: [Question] Distributed Load Testing with Mesos and 
Gatling

On 07/01/2015 01:17 PM, Carlos Torres wrote:

Hi all,

In the past weeks, I've been thinking in leveraging Mesos to schedule 
distributed load tests.


An excellent idea.


One problem, at least for me, with this approach is that the load testing tool 
needs to coordinate
the distributed scenario, and combine the data, if it doesn't, then the load 
clients will trigger at
different times, and then later an aggregation step of the data would be 
handled by the user, or
some external batch job, or script. This is not a problem for load generators 
like Tsung, or Locust,
but could be a little more complicated for Gatling, since they already provide 
a distributed model,
and coordinate the distributed tasks, and Gatling does not. To me, the approach 
the Kubernetes team
suggests is really a hack using the 'Replication Controller' to spawn multiple 
replicas, which could
be easily achieved using the same approach with Marathon (or Kubernetes on 
Mesos).



I was thinking of building a Mesos framework, that would take the input, or 
load simulation file,
and would schedule jobs across the cluster (perhaps with dedicated resources 
too minimize variance)
using Gatling.  A Mesos framework will be able to provide a UI/API to take the 
input jobs, and
report status of multiple jobs. It can also provide a way to sync/orchestrate 
the simulation, and
finally provide a way to aggregate the simulation data in one place, and serve 
the generated HTML
report.



Boiled down to its primitive parts, it would spin multiple Gatling (java) 
processes across the
cluster, use something like a barrier (not sure what to use here) to wait for 
all processes to
be ready to execute, and finally copy, and rename the generated simulations 
logs from each
Gatling process to one node/place, that is finally aggregated and compiled to 
HTML report by a
single Gatling process.



First of all, is there anything in the Mesos community that does this already? 
If not, do you
think this is feasible to accomplish with a Mesos framework, and would you 
recommend to go with this
approach? Does Mesos offers a barrier-like features to coordinate jobs, and can 
I somehow move
files to a single node to be processed?


This all sounds workable, but, I do not have all the experiences
necessary to qualify your ideas. What I would suggest is a solution that
lends itself to testing similarly configured cloud/cluster offerings, so
we the cloud/cluster community has a way to test and evaluate   new
releases, substitute component codes, forks and even competitive
offerings. A ubiquitous  and robust testing semantic based on your ideas
does seem to be an overwhelmingly positive idea, imho. As such some
organizational structures to allow results to be maintained and quickly
compared to other 'test-runs' would greatly encourage usage.
Hopefully 'Gatling' and such have many, if not most of the features
needed to automate the evaluation of results.



Finally, I've never written a non-trivial Mesos framework, how should I go 
about, or find more
documentation, to get started? I'm looking for best practices, pitfalls, etc.


Thank you for your time,
Carlos


hth,
James


Thanks for your feedback.

I like your idea about having the ability to swap out the different components 
(e.g. load generators) and perhaps even providing an abstraction on the 
charting, and data reporting mechanism.

I'll probably start with the simplest way possible, though, having the 
framework deploy Gatling across the cluster, in a scale-out fashion, and 
retrieve each instance results. Once I got that working then I'll start 
experimenting with abstracting out certain functionality.

I know Twitter has a distributed load generator, called Iago, that apparently 
works in Mesos, it'd be awesome, if any of its contributors chime in, and share 
what things worked great, good, and not so good.


The few things I'm concern in terms of implementing such a framework in Mesos 
is:

* Noisy neighbors, or resource isolation.
 - Rationale: It can introduce noise to the results if load generator 
competes for shared resources (e.g. network) with others tasks.

* Coordination of execution
 - Rationale: Need the ability to control execution of groups of related tasks. User 
A submits simulation that might create 5 load clients (tasks?), right after that, User B 
submits a different simulation that creates 10 load clients. Ideally, all of User A load 
clients should be on independent nodes, and should not share the same slaves with User B 
load clients, if not enough slaves are available on the cluster, then User B's simulation 
queues, until slaves are available. There might be enough resources

Re: [Question] Distributed Load Testing with Mesos and Gatling

2015-07-02 Thread CCAAT

On 07/01/2015 01:17 PM, Carlos Torres wrote:

Hi all,

In the past weeks, I've been thinking in leveraging Mesos to schedule 
distributed load tests.


An excellent idea.


One problem, at least for me, with this approach is that the load testing tool 
needs to coordinate
the distributed scenario, and combine the data, if it doesn't, then the load 
clients will trigger at
different times, and then later an aggregation step of the data would be 
handled by the user, or
some external batch job, or script. This is not a problem for load generators 
like Tsung, or Locust,
but could be a little more complicated for Gatling, since they already provide 
a distributed model,
and coordinate the distributed tasks, and Gatling does not. To me, the approach 
the Kubernetes team
suggests is really a hack using the 'Replication Controller' to spawn multiple 
replicas, which could
be easily achieved using the same approach with Marathon (or Kubernetes on 
Mesos).



I was thinking of building a Mesos framework, that would take the input, or 
load simulation file,
and would schedule jobs across the cluster (perhaps with dedicated resources 
too minimize variance)
using Gatling.  A Mesos framework will be able to provide a UI/API to take the 
input jobs, and
report status of multiple jobs. It can also provide a way to sync/orchestrate 
the simulation, and
finally provide a way to aggregate the simulation data in one place, and serve 
the generated HTML
report.



Boiled down to its primitive parts, it would spin multiple Gatling (java) 
processes across the
cluster, use something like a barrier (not sure what to use here) to wait for 
all processes to
be ready to execute, and finally copy, and rename the generated simulations 
logs from each
Gatling process to one node/place, that is finally aggregated and compiled to 
HTML report by a
single Gatling process.



First of all, is there anything in the Mesos community that does this already? 
If not, do you
think this is feasible to accomplish with a Mesos framework, and would you 
recommend to go with this
approach? Does Mesos offers a barrier-like features to coordinate jobs, and can 
I somehow move
files to a single node to be processed?


This all sounds workable, but, I do not have all the experiences 
necessary to qualify your ideas. What I would suggest is a solution that 
lends itself to testing similarly configured cloud/cluster offerings, so 
we the cloud/cluster community has a way to test and evaluate   new 
releases, substitute component codes, forks and even competitive 
offerings. A ubiquitous  and robust testing semantic based on your ideas 
does seem to be an overwhelmingly positive idea, imho. As such some 
organizational structures to allow results to be maintained and quickly 
compared to other 'test-runs' would greatly encourage usage.
Hopefully 'Gatling' and such have many, if not most of the features 
needed to automate the evaluation of results.




Finally, I've never written a non-trivial Mesos framework, how should I go 
about, or find more
documentation, to get started? I'm looking for best practices, pitfalls, etc.


Thank you for your time,
Carlos


hth,
James



Re: Introducing the ceph-mesos framework

2015-06-29 Thread CCAAT

On 06/28/2015 05:43 PM, Zhongyue Luo wrote:

About the Ceph binary, this framework pulls a Docker container to run
Ceph so int that case you should custom build and put them in a Docker
image. Our observation are that CephFS is very unstable so we are developing a
file system called RGWFS, which is a HCFS based on RGW.



Interesting strategy. We currently have Ceph-0.94.2 across of each 
system's btrfs and then install mesos, spark and the other frameworks. 
The only problem I'm seeing right now is too many writes initiated by 
both ceph and btrfs. I believe as both mature this will be resolve via 
some more advanced configuration options and tools. I have not had the 
time to drill down into this penalty (too many duplicative writes) atm. 
It's a small node (3-5 slaves). Our goals is a single compute engine for 
a singular 'Big Science' problem where the cluster scales and memory is 
managed at the lowest level possible (bare metal strategy).

Our problems are intensively memory bound.


Do keep us informed on your progress. Gentoo affords a fine grain of 
control over software compilations and tuning the various kernels we

are testing is also an integral part of robustly tuning Cephfs and btrfs.

James





I'll point out your comments and more details of our plan in our README.md

Thanks!

On Mon, Jun 29, 2015 at 2:31 AM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

Hello  Zhongyue Luo,


Well this is very interesting.

Are you now, or intend to replace HDFS with cephfs?
That is cephfs is the distributed file sustem upon which
mesos and the frameworks run?

Please clarify exactly what your plans are and the architecture and
platforms you intend to support.

Regardless, this is great news.



Also, on gentoo I have these (flag) options for Ceph:

{babeltrace cryptopp debug fuse gtk +libaio libatomic lttng +nss
radosgw static-libs tcmalloc xfs zfs}

Which options do you currently use/support and what are your long range
plans for ceph(fs)?

Ceph supports RDMA. Do you have plans in your ceph projects to
support RDMA?


James






On 06/28/2015 10:31 AM, Zhongyue Luo wrote:

Hi list,

Me and my colleges developed a framwork called ceph-mesos. As you've
already predicted from its name, the framework is for scaling Ceph
clusters on Mesos.

Check out the code on Github
https://github.com/Intel-bigdata/ceph-mesos

We've just announced this at the 2nd Beijing MUG meetup.Here is
the link
to the presentation

https://docs.google.com/presentation/d/1AzcOD9Aug6BrWevdpHXgyXlcHUzVq2_MMRNvutLelJY/edit?usp=sharing.

There is also a demo video
http://v.youku.com/v_show/id_XMTI3MjMxNTU5Ng==.html. Audio is in
Chinese but you won't have a problem following through even if
you mute.

Thanks.


--
*Intel SSG/STO/BDT*
880 Zixing Road, Zizhu Science Park, Minhang District, 200241,
Shanghai,
China
+862161166500 tel:%2B862161166500





--
*Intel SSG/STO/BDT*
880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
China
+862161166500




Re: Introducing the ceph-mesos framework

2015-06-28 Thread CCAAT

Hello  Zhongyue Luo,


Well this is very interesting.

Are you now, or intend to replace HDFS with cephfs?
That is cephfs is the distributed file sustem upon which
mesos and the frameworks run?

Please clarify exactly what your plans are and the architecture and
platforms you intend to support.

Regardless, this is great news.



Also, on gentoo I have these (flag) options for Ceph:

{babeltrace cryptopp debug fuse gtk +libaio libatomic lttng +nss radosgw 
static-libs tcmalloc xfs zfs}


Which options do you currently use/support and what are your long range
plans for ceph(fs)?

Ceph supports RDMA. Do you have plans in your ceph projects to support RDMA?


James






On 06/28/2015 10:31 AM, Zhongyue Luo wrote:

Hi list,

Me and my colleges developed a framwork called ceph-mesos. As you've
already predicted from its name, the framework is for scaling Ceph
clusters on Mesos.

Check out the code on Github https://github.com/Intel-bigdata/ceph-mesos

We've just announced this at the 2nd Beijing MUG meetup.Here is the link
to the presentation
https://docs.google.com/presentation/d/1AzcOD9Aug6BrWevdpHXgyXlcHUzVq2_MMRNvutLelJY/edit?usp=sharing.

There is also a demo video
http://v.youku.com/v_show/id_XMTI3MjMxNTU5Ng==.html. Audio is in
Chinese but you won't have a problem following through even if you mute.

Thanks.


--
*Intel SSG/STO/BDT*
880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
China
+862161166500




Re: Thoughts and opinions in physically building a cluster

2015-06-19 Thread CCAAT

On 06/19/2015 01:45 PM, Dave Martens wrote:

Thanks for all of these comments - I had similar questions.

What is the minimum RAM for a master or a slave?  I have heard that the
Mesos slave software adds 1GB of RAM on top of what the slave's workload
processing will require.  I have read that 8GB is the min for a Mesos
machine but it wasn't clear that this was an official/hard requirement.


There are probably published/standard numbers for the various distros 
that the slave node is build upon (actual or virtual). Actually with a 
robust (CI) infrastructure, these sort of resource metrics and the 
various benchmarks should be revealed to the user community, routinely.

I'm not certain on what, if any, of this sort of data is being published.


If you tune (stip) the Operating System or the virtual image or the 
kernel, then these numbers are most likely lower. I'm not sure much has 
been published on tuning the OS, kernels or installs for mesos. HPC

offerings will surely be pushing the envelop of these and many more
related  metrics for performance tuning to specialized classes of 
problem, hardware specifics and other goals of performance tuning.



Most will run on bloat_ware but the smarter datacenters and HPC folks 
will 'cut the pork' for those single digit performance gains. YMMV.



hth,
James







Re: [Announce]: 0.3.0 release of microservices-infrastructure

2015-06-09 Thread CCAAT

Very Interesting projects there Steven Borrelli!


So, I've been working on structuring (3) classes of nodes for general
deployment among multiple  different cluster/cloud offerings.

(1) The traditional 'slave-node' that is 100% controlled by the 
cluster/cloud master. Classical workload service typically deployed 
currently.



(2) The 'worker-node' that has some degree of autonomy (less than 100% 
control by the master) so that it may actually communicate and perform 
work for other masters or even migrate between master-nodes of different 
cluster/cloud systems; like from a mesos environment to an openstack 
environment. Another aspect of the worker-node, is that it's unique 
resources help the worker-node decide what types of problems (the 
masters are in charge of) to pursue. My initial intuition is hardware 
resources, like an Rf spectrum analyzer externally attached to the node, 
a DSP or a FPGA or any sensor; but really there'd be room for unique 
software resources that are part of the worker's inherent OS too.



(3) The 'entrepreneur-node' that not only can act as a worker-node but
also can actually decide to become a 'master-node' of a particular 
system, or even set itself up as a  new cluster and recruit from among 
class (2) worker nodes. In Essence the Autonomy_Function would be 
liberally experimented with in a variety of mechanisms; ultimately in 
search of work that needs to be perform, assimilation of resources to 
accomplish such work, and report of work status and accomplishment to a 
'higher authority'.



Continuous Integration (CI) comes to mind as an immediate area for 
testing these concepts and codes. This work really becomes quite easy 
IFF one can presume that project authorities are willing to precisely 
define  the (classes) of subservient nodes and there common features 
found in different cluster/cloud offerings in a well defined common data 
structure. If these various project are not keen on these ideas

of node liberation, then it becomes a question of how best to define
each node outside of project controls. The latter option can lead
loss of control over these ideas for me.


My vision is to use self modifying codes [1] for much of this work. As 
such would this sort of research be welcome at Cisco's 
microservices-infrastructure projects?At mesos?



[1] http://en.wikipedia.org/wiki/Self-modifying_code


James



On 06/09/2015 11:43 AM, Steven Borrelli wrote:

On behalf of the development team. I'm pleased to announce the 0.3.0
release of Microservices Infrastructure. In the weeks since 0.2, we've
added a number of features and improvements.

The software can be downloaded at:
https://github.com/CiscoCloud/microservices-infrastructure

Documentation is located at:
https://microservices-infrastructure.readthedocs.org/en/latest


I’ll be speaking next week at the NYC mesos meetup:
http://www.meetup.com/Apache-Mesos-NYC-Meetup/events/222932873


What is it?

Microservices Infrastructure is software that launches servers and then
configures them to support a wide range of applications - like
continuous delivery or realtime data processing.

This makes it easy to run application containers alongside data-centric
workloads like Kafka, HDFS, Cassandra and Elasticsearch. We take leading
open-source projects (Docker, Consul, Terraform, Mesos) and integrate
them to build a powerful platform.

Microservices Infrastructure deploys to multiple cloud providers in
minutes. High-availability, service discovery, metrics, security, and
logging are built in.

All the components are released under an Apache 2.0 license. Bug reports
and pull requests are welcome.


New Features


Deployment to OpenStack, AWS and Google Cloud via Terraform

With the addition of Openstack support
https://github.com/hashicorp/terraform/blob/master/CHANGELOG.md#040-april-2-2015
 to
Terraform http://terraform.io/, Ansible-based cloud provisioning has
been deprecated. With this release we've included configurations for
OpenStack, Amazon Web Services, and Google Cloud. Future releases will
include storage, VPN, and networking configurations and support for more
providers.

To make the cloud installation process smoother, we've included a
dynamic Ansible inventory script terraform.py
https://github.com/CiscoCloud/terraform.py that automatically
discovers your hosts across clouds from your Terraform tfstate file and
integrates them with Ansible roles.


Logging with Logstash and collectd

This release includes support for collectd https://collectd.org/ and
Logstash https://www.elastic.co/products/logstash. Collectd is used to
monitor system statistics and Logstash can be used to forward system
logs to a central point of a logging service.

0.3.0 includes collectd plugins for Docker, Mesos, Marathon and Zookeeper.


Simplified Vagrant runs

We've simplified the Vagrant process, getting rid of the need to run
security setup or install python modules. |vagrant up| will bring up 

Re: 答复: [DISCUSS] Renaming Mesos Slave

2015-06-05 Thread CCAAT


+1 master/slave, no change needed.  is the same as
master/slaveI.E. keep the nomenclature as it currently is

This means keep the name 'master' and keep the name 'slave'.


Are you applying fuzzy math or kalman filters to your summations below?

It looks to me, tallying things up, Master is kept as it is
and 'Slave' is kept as it is. There did not seem to be any consensus
on the new names if the pair names are updated. Or you can vote 
separately on each name? On an  real ballot, you enter the choices,

vote according to your needs, tally the results and publish them.
Applying a 'fuzzy filter' to what has occurred in this debate so far
is ridiculous.

Why not repost the question like this or something on a more fair
voting preference:


Please vote for your favourite Name-pair in Mesos, for what is currently
Master-Slave. Note Master-Slave is the no change vote option.

[] Master-Slave
[] Mesos-Slave
[] Mesos-Minion
[] Master-Minion
[] Master-Follower
[] Mesos-Follower
[] Master-worker
[] Mesos-worker
[] etc etc

-


Tally the result and go from there.
James




On 06/05/2015 04:27 AM, Adam Bordelon wrote:

Wow, what a response! Allow me to attempt to summarize the sentiment so far.

Let's start with the implicit question,
_0. Should we rename Mesos Slave?_
+1 (Explicit approval) 12, including 7 from JIRA
+0.5 (Implicit approval, suggested alternate name) 18
-0.5 (Some disapproval, wouldn't block it) 5, including 1 from JIRA
-1 (Strong disapproval) 16

_1. What should we call the Mesos Slave node/host/machine?_
Worker: +10, -2
Agent: +6
Follower (+Leader): +4, -1
Minion: +2, -1
Drone (+Director/Queen): +2
Resource-Agent/Provider: +2

_2. What should we call the mesos-slave process (could be the same)?_
Pretty much everybody says that it should be the same as the node.

_3. Do we need to rename Mesos Master too?_
Most say No, except when slave's new name has a preferred pairing (e.g.
Follower/Leader)

_4. How will we phase in the new name and phase out the old name?_
To calm any fears, we would have to go through a full deprecation cycle,
introducing the new name in one release, while maintaining
symlinks/aliases/duplicate-endpoints for the old name. In a subsequent
release, we can remove the old name/endpoints. As we introduce the new
Mesos 1.0 HTTP API, we will already be introducing breaking API changes,
so this would be an ideal time to do a rename.

Whether or not we decide to officially change the name in the code/APIs,
some organizations are already using alternative terminologies in their
presentations/scripts. We could at least try to agree upon a recommended
alternative name for these purposes.

_5. How do we vote on this?_
First, FYI: https://www.apache.org/foundation/voting.html
It seems there are two potentially separate items to vote on:

Prop-A: Rename Mesos-Slave in the code/APIs
Qualifies as a code modification, so a negative (binding) vote
constitutes a veto. Note that there are no -1s from the Mesos PMC yet.
After this week of discussion where the community is invited to share
their thoughts/opinions, we will call for an official VOTE from the PMC
members. The proposal will pass if there are at least three positive
votes and no negative ones.

Prop-B: Recommended Alternative Name for Slave
This can follow the common format of majority rule. We can gather
recommendations during this one week discussion period, and then vote on
the top 2-3 finalists.

On Thu, Jun 4, 2015 at 8:23 PM, Emilien Kenler eken...@wizcorp.jp
mailto:eken...@wizcorp.jp wrote:

+1 for keeping master/slave.

On Fri, Jun 5, 2015 at 12:00 PM, Panyungao (Wingoal)
panyun...@huawei.com mailto:panyun...@huawei.com wrote:

+1  master/slave. 

__ __

These are only terminologies in software architecture.  They
have different definitions from those of social or political
view. 

__ __

*发件人:*zhou weitao [mailto:zhouwtl...@gmail.com
mailto:zhouwtl...@gmail.com]
*发送时间:*2015年6月5日10:40
*收件人:*user@mesos.apache.org mailto:user@mesos.apache.org
*主题:*Re: [DISCUSS] Renaming Mesos Slave

__ __

+1 master/slave, no change needed.

__ __

2015-06-05 0:10 GMT+08:00 Ankur Chauhan an...@malloc64.com
mailto:an...@malloc64.com:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

+1 master/slave

James made some very good points and there is no technical
reason for
wasting time on this.

On 04/06/2015 08:45, James Vanns wrote:
 +1 master/slave, no change needed.

 I couldn't agree more. This is a barmy request; master/slave is a
 well understood common convention (if it isn't well defined). This
 is making an issue out of something that isn't. Not at least as far
 as I see it - I don't have a habit of confusing software/systems
 nomenclature 

Re: Cluster autoscaling in Spark+Mesos ?

2015-06-05 Thread CCAAT

On 06/05/2015 10:09 AM, Alex Gaudio wrote:

Hi @Ankur,



Next, we built Relay https://github.com/sailthru/relay and, the Mesos
extension, Relay.Mesos https://github.com/sailthru/relay.mesos, to
convert our small scripts into long-running instances we could then put
on Marathon.

Thanks.



From the first reference page:
 This type of problem is often quite complex, and there is a field 
called Control Theory dedicated to problems 


Oh WOW! Somebody else that understands controls.

I was going to suggest some code to build a hybrid Feedback (PID based 
on resource utilizations) + Feedforward (based on chronologically 
repetitive events) to solve this problem, because it's quite easy to 
also integrate relay boards to boot-up/shut-down resources, if systems 
are properly configured for such up/down cycling of hardware. There is 
also legacy boot/shutdown controls for resources via PXE and such ether 
based hardware tricks.


I think using concepts of process control applied to specific 
computational resources, such as ram, cpu, etc etc, also needs to be 
addressed with autoscaling.  Quick glance at your projects and you

are well underway.

Hint: You might want to explain some fundamentals of Control Theory,
as I'm not sure this collective is aware of such simple, beautiful
and robust mathematics.


PS: avoid Nyquist stability, for now, but bode plot would be keen!


hth,
James





Re: [DISCUSS] Renaming Mesos Slave

2015-06-02 Thread CCAAT

On 06/01/2015 04:18 PM, Adam Bordelon wrote:

There has been much discussion about finding a less offensive name than
Slave, and many of these thoughts have been captured in
https://issues.apache.org/jira/browse/MESOS-1478


I find political correctness rather nausiating. Folks should stop
trying to apologies for what others did, often hundreds of years ago.
I was not part of that. The advanced education system, the political 
system and the current wealth control systems around the globe, are in 
fact and indeed Master-Slave relationships; so why cleanse this notion 
prematurely?



Master-slave has a rich history in the entire field of compuatational
resources and electronics. That usage has nothing to do with social 
conventions and failings of the past. So, if you want to do something

effective for those of us less fortunate, why not use terms like







I would like to open up the discussion on this topic for one week, and
if we cannot arrive at a lazy consensus, I will draft a proposal from
the discussion and call for a VOTE.
Here are the questions I would like us to answer:
1. What should we call the Mesos Slave node/host/machine?
2. What should we call the mesos-slave process (could be the same)?
Elite-taxpayer  or  Lawyer-citizen or Billionare-wager  or 
Professor-debtor ?


 Something more apropo?


3. Do we need to rename Mesos Master too?


The concept of a master has been around ever since (2) males or more 
graced this planet (your theories of soccial evilutoin may vary but they

are irrelevant).

Commander? General ? Generalisimo?   Master_Chief  Warlord?

Why not get rid of the entire Apache evironment why you are at it?
This line of reasoning is nothing but *STUPID*   You cannot errase
the history of the warrior existence of mankind. Nor should any today
apologize for historical convention. WE did not commit any bad deeds
before we were born, so just get over it!




Another topic worth discussing is the deprecation process, but we don't
necessarily need to decide on that at the same time as deciding the new
name(s).
4. How will we phase in the new name and phase out the old name?

Please voice your thoughts and opinions below.

Thanks!
-Adam-

P.S. My personal thoughts:
1. Mesos Worker [Node]
2. Mesos Worker or Agent
3. No
4. Carefully



This is the sadest thread I have ever read.

James




Re: [DISCUSS] Renaming Mesos Slave

2015-06-02 Thread CCAAT

On 06/02/2015 11:30 AM, Alexander Gallego wrote:

1. mesos-worker
2. mesos-worker


Currently, my (limited) understanding of the codebase for mesos, is that 
slave does not have any autonomy, is 100% controlled by the Master, 
hence the clear nomenclature of Master-Slave. If we are to migrate to 
'mesos-worker' this implies that the worker has 'some standing' some 
rights? The worker can leave the mesos and move on (attach) to another 
supervisor?  Actually I like this concept, since mesos is not likely to 
be the only master in a data center, maybe we need to begin thinking 
about node (migration) to other masters in a heterogeneous data-center 
? Ah! Eureka now I see what is really going on; mesos leadership is 
preparing for other masters to migrate node-worker hardware to other 
cluster codes, in the spirit of heterogeneous, politically_correct, 
cluster compuations? I see, we would not want to offend any other 
software development team; after all

opensource is opensource..


Also, what happens to mesos clustering codes if folks decided to 
experiment with self modifying codes; like the code found in stuxnet?

Are those still worker codes?   Are they subservient to the mesos
after they are instantiated? We have the family of self-modifying codes
to contend with in the future; surely they are going to find a path
to these clusters, whether the developers like it or not.

How shall the naming classifications match reality in the future? I'd 
suggest some thought as to where mesos is heading, as changes in 
nomenclature and diversions, no matter how well-intended, from 
established jargon can cause loads of unforeseen problems if not lead to 
obscurity. For example when somebody has work in a multi-processor 
hardware development group, you either have master-slave relationships, 
voter, or some non-sensical, if not exotic, nomenclature that does not 
withstand the ravages of competing codes over time. A historical review 
might be in order for parallel efforts; look at how divergent naming 
schemes have not survived.. I doubt seriously this is the first 
venture into finding a more accurate and alternative naming scheme. A 
less accurate scheme; surely many exist, as we've seen a few in this thread.



Change the names as you like. But, but be  mindful that your new 
nomenclature is sensical and exudes forethought of the futuristic 
feature sets to be found in parallel processing, both hardware and software.



hth,
James





Re: Problem using cgroups/mem isolator

2015-05-02 Thread CCAAT

On 05/02/2015 02:17 PM, Tim Chen wrote:

Hi Arunabha,

Which linux distro/version are you using?

A quick search on google finds some settings that might be required to
turn on memsw.limit_in_bytes options for cgroups:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1348688

Tim


Maybe the mesos code(s) needs to perform a kernel test to see what is
enabled and what is not available, regardless of distro? Or does it make
more sense to have functions based on cgroups, whether or not they are
properly activated in the kernel?


James





Re: group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread CCAAT

On 04/28/2015 11:54 AM, Dick Davies wrote:

Thanks Ian.

Digging around the cgroup there are 3 processes in there;

* the mesos-executor
* the shell script marathon starts the app with
* the actual command to run the task ( a perl app in this case)


We've been having discussions about various aspects of  memory 
management. It needs to be enhanced both at the mesos-cluster level and 
the framework (scheduler?) level, right above the myriad of processes 
that start|idle|stop.



In fact if you look at hwloc [1] there is a movement that abstracts
resource classifications, particularly of memory/cache/registers, in 
such as way as to make sense both in a heterogeneous environment and 
within arch-processor families that have different resources mixed into 
the processor chipsset. Furthermore, gcc-5.1 has full support for RDMA 
and generic access to GPU based resources, so that is further reason to 
expand the use of cgroups and allow folks running these clusters to 
directly tune performance via cgroup settings, while a cluster is up and 
running.



I really hate to be the 'old fashion computer scientist' in this group,
but, I think that the role of and usage of 'cgroups' is going to have to
be expanded greatly as a solution to the dynamic memory management needs 
of both the cluster(s) and the frameworks. This problem is not going 
away and I see no other serious solution to cgroup use expansion.



[1] http://www.open-mpi.org/projects/hwloc/


hth,
James





The line of code you mention is never run in our case, because it's
wrapped in the conditional
I'm talking about!

All I see is cpu.shares being set and then mem.soft_limit_in_bytes.


On 28 April 2015 at 17:47, Ian Downes idow...@twitter.com wrote:

The line of code you cite is so the hard limit is not decreased on a running
container because we can't (easily) reclaim anonymous memory from running
processes. See the comment above the code.

The info-pid.isNone() is for when cgroup is being configured (see the
update() call at the end of MemIsolatorProcess::prepare()), i.e., before any
processes are added to the cgroup.

The limit  currentLimit.get() ensures the limit is only increased.

The memory limit defaults to the maximum for the data type, I guess that's
the ridiculous 8 EB. It should be set to what the initial memory allocation
was for the container so this is not expected. Can you look in the slave
logs for when the container was created for the log line on:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393

Ian

On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies d...@hellooperator.net wrote:


Been banging my head against this  for a while now.

mesos 0.21.0 , marathon 0.7.5, centos 6 servers.

When I enable cgroups (flags are : --cgroups_limit_swap
--isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting
are reflected in memory.soft_limit_in_bytes but not in

memory.limit_in_bytes or memory.memsw.limit_in_bytes.


Upshot is our runaway task eats all RAM and swap on the server
until the OOM steps in and starts firing into the crowd.

This line of code seems to never lower a hard limit:


https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L382

which means both of those tests must be true, right?

the current limit is insanely high (8192 PB if i'm reading it right) - how
would
I make info-pid.isNone() be true ?

Have tried restarting the slave, scaling the marathon apps to 0 tasks
then back. Bit stumped.









Re: Changing Mesos Minimum Compiler Version

2015-04-21 Thread CCAAT

Hello one and all,

I'm not voting here, my reasons should be ridiculously clear.

I only want to point out that WE, the mesos community, should be
planning to move to gcc-5.x, asap. Why? Excellent question:

[1] https://gcc.gnu.org/wiki/OpenACC

[2] https://gcc.gnu.org/gcc-5/changes.html#offload
(look at the openMP 4.0 specification


Gcc-5.x will allow the beginnings of testing codes on GPUs
and using RDMA; combined these sorts of improvements will
get Mesos + spark + storm rocking in the numerical world.

ymmv.
hth,
James




On 04/21/2015 05:07 AM, Alex Rukletsov wrote:

Folks, let's summarize and move on here.

Proposal out on April 9, 2015. Current status (as of April 21, 2015):


+1 (Binding)
--
Vinod Kone
Timothy Chen
Yan Xu
Brenden Matthews

+1 (Non-binding)
--
Cody Maloney
Joris Van Remoortere
Jeff Schroeder
Jörg Schad
Elizabeth Lingg
Alexander Rojas
Alex Rukletsov
Michael Park
Haosdent Huang
Bernd Mathiske

0 (Non-binding)
--
Nikolaos Ballas

There were no -1 votes.

Cody, let's convert MESOS-2604 to an epic and bump the version in 0.23.

Thanks,
Alex


On Mon, Apr 13, 2015 at 12:46 PM, Bernd Mathiske be...@mesosphere.io
mailto:be...@mesosphere.io wrote:

+1

 On Apr 10, 2015, at 6:02 PM, Michael Park mcyp...@gmail.com 
mailto:mcyp...@gmail.com wrote:

  +1
 
  On 9 April 2015 at 17:33, Alexander Gallego agall...@concord.io
mailto:agall...@concord.io wrote:
 
  This is amazing for native devs/frameworks.
 
  Sent from my iPhone
 
  On Apr 9, 2015, at 5:16 PM, Joris Van Remoortere
jo...@mesosphere.io mailto:jo...@mesosphere.io
  wrote:
 
  +1
 
  On Thu, Apr 9, 2015 at 2:14 PM, Cody Maloney
c...@mesosphere.io mailto:c...@mesosphere.io
  wrote:
  As discussed in the last community meeting, we'd like to bump the
  minimum required compiler version from GCC 4.4 to GCC 4.8.
 
  The overall goals are to make Mesos development safer, faster, and
  reduce the maintenance burden. Currently a lot of stout has
different
  codepaths for Pre-C++11 and Post-C++11compilers.
 
  Progress will be tracked in the JIRA: MESOS-2604
 
  The resulting supported compiler versions will be:
  GCC 4.8, GCC 4.9
  Clang 3.5, Clang 3.6
 
  For reference
  Compilers by Distribution Version: http://goo.gl/p1t1ls
 
  C++11 features supported by each compiler:
  https://gcc.gnu.org/projects/cxx0x.html
  http://clang.llvm.org/cxx_status.html
 
 






Re: Spark on Mesos / Executor Memory

2015-04-11 Thread CCAAT

On 04/11/2015 02:50 PM, Timothy Chen wrote:

Hi James,

You are right multiple frameworks becomes a different discussion how to adjust 
and allow more
dynamic  resource negotiation to happen, also factor in fairness and others.


Agreed, there will be many more parameters to define need so that 
schedulers and other resource managers can do their job. Maybe a list 
where those parameters are categorized, in a central location is a good 
start at characterizing that part of the problem Different 
frameworks may need to pass different parameters around and to the 
resource manager(s).




There are more work that is happening in mesos to try to address multiple 
framework like
optimistic offer and inverse offers, but I think in terms of dynamic memory 
needs for a
framework its still largely based on the scheduler to specify and scale 
accordingly when
resources are needed or not needed anymore.


The distributed applications (frameworks) will have to provided 
frequently updated info to the resource managers. Perhaps some typical 
examples of such details would be good info to start collecting, 
analyzing and organizing? Also, the memory/core ratio of a given cluster

will differ from one to the next cluster.



One way that is being addressed in spark is integrating dynamic allocation into
resource scheduler such as mesos and yarn, but there are still more work needed 
as
dynamic allocation only looks at certain metrics that might not address all 
kinds of needs.


Yes both the frameworks and the resource manager will have to pass 
information back and forth, in a low latency, dynamic fashion, along 
with other communications needs. I'm sure the number of metrics will 
grow, as folks dig deeper into these issues.




If you have any specific use case or examples that you think existing work 
doesn't fit and
like to be addressed that will be a good way to start the conversation.


I doubt seriously there is a single silver bullet here, or even a few 
silver bullets that can get the job done. My suspicion is that as we run 
an expanding array of distributed applications on mesos, we'll 
eventually gather up enough profile information on these distributed 
applications on top of mesos, to be able to sort them into quasi 
categories.  Profiling of running codes has always been an edgy 
endeaver, but I see no other reliable way to actually figure out 
anything close to optimization of resources, for a cluster. We can then 
figure out what mix of categorized distributed applications best fits 
the dynamic resource model Then tune the RM to that organization's 
priority semantics for resource allocation. Once new frameworks  are 
profiled, the Resource Manager can then make better decisions.


I think the real trick is going to be coordination with those low level 
kernel resources that we can profile, with such tools as kernelshark, 
with system level resource monitoring as to reinforce the cluster 
resource management decisions. I.E. we're going to have to run a wide 
variety of these frameworks, in isolation, and use both kernel and 
system level resource monitoring tools to actually accurately 
characterize the resource loading curves. Then we can propose dynamic 
methods to manage the resource demands of a given category of frameworks.


Eventually, we could then go down the path of profiling the cluster, as 
multiple simultaneous resource loads are mix, due to a variety of 
frameworks and profiled, yet again. Knowlege on smaller scale clusters 
might not be linearly applicable to larger clusters, but it is a start, 
imho.


I think it's going to be years of collaboration, with codes, patches and 
profiles being shared, to tame the beast. However, that said, I would 
certainly be happy if I'm wrong and look forward to those ideas to 
simplify this problem's solution.




James




Tim


On Apr 11, 2015, at 1:05 PM, CCAAT cc...@tampabay.rr.com wrote:

Hello Tim,

Your approach seems most reasonable, particularly from an over arching 
viewpoint. However, it occurs to me the that as folks have several to many 
different frameworks (distributed applications)  running on a given mesos 
cluster, that the optimization of resource allocation (utilization) may 
ultimately need to be under some sort of tunable, dynamic scheme. Most 
distributed application, say it runs for a few hours, will usually not have a 
constant resource demand on memory  so how can any static configuration suffice 
for a dynamic mix of frequently changing distributed application work well with 
static configurations. This is particularly amplified as a problem, where
Apache-spark is an in-memory resource demand, that is very different
than other frameworks that may be active on the same cluster.

I really think we are just experiencing the tip of the iceberg here
as these mesos clusters grow, expand and take on a variety of problems,
or did I miss some already existing robustness in the codes?


James




On 04/11/2015 12:29 PM, Tim Chen

Re: Current State of Service Discovery

2015-04-01 Thread CCAAT

On 04/01/2015 11:20 AM, Christos Kozyrakis wrote:

Service discovery is a topic where it's unlikely that a single solution
will satisfy every need and every constraint. It's also good for the
Mesos community to have multiple successful alternatives, even when they
overlap in some ways.

I will comment a little on Mesos-DNS since I designed it and currently
maintain it. The project is still in early stages and the functionality
is not where it should be. But it's a start and we are improving it
every day.

* Naming/ports info: the release of Mesos 0.22 allows frameworks to
provide rich service discovery info for tasks and executors, including
naming the ports (e.g., 80 is http, 90 is RPC, etc) and providing
interesting labels for environment (prod/test/...), location, version,
etc. Once frameworks start using this feature, we will use it in
Mesos-DNS to provide more intuitive names for tasks and services and
give you more meaningful info on service discovery requests.

* SRV records: you are correct that SRV records are not the most
convenient way to get port information. Very little software exists to
take advantage of these records and the fact that you need two requests
to get both a port and an IP address is annoying. We are adding adding
an HTTP interface to Mesos-DNS to allow you to get more compact and
useful port info. See the
https://github.com/mesosphere/mesos-dns/tree/http branch. It is not
ready for production use yet, but it will give you an idea.

* Namezone and coordination with other name servers: yes, your
suggestion makes a lot of sense. Looking into such a setup is on our
todo list. If you have time to investigate this and contribute the
changes/setup needed, that would be great.

Regards



Well, here is a article that is good in it's a partial summary, but
surely it needs to be updated?

http://jasonwilder.com/blog/2014/02/04/service-discovery-in-the-cloud/


What I'd like, is to read a survey article on service/resource 
discovery where some knowledgeable person extols on building a cluster

up for general purpose uses (this is common) vs Big-Data:Big-Science.
The different options from a macro point of view would be keen.

Most technical organization that I interact with on an ongoing basis
pretty much have the same goals that I do: Heterogeneous hardware in the 
local cluster and the ability to *seemlessly* rent outside (vendor) 
resources for needs above the capability of the local cluster. 
*Everyone* wants their own local cluster, and the dynamic ability to 
supplement that in-house cluster with cloud services.



That's really the white paper I'm searching for; as are many others. The 
more details the better. If mesos addresses that need, it's gonna

be a very big hit. In fact, it greatly behoves the newer projects to
explain (as precisely as possible) the project goals and what it 
purports to achieve (what is fixed or enhanced) over the existing codes

(projects) in some very clear detail, imho.


James



Mesos-DNS

This project came to my attention this week, and I am looking to get
it installed today to have hands on time with it.  Basically, it's a
binary that queries the mesos-master and develops A records that are
hostnames, based on the framework names, and SRV records based on
the assigned ports.

This is where I get confused. I can see the A records being useful,
however, you would have to have your entire network be able to be
use the mesos-dns (including non-mesos systems).  Otherwise how
would a client know to connect to a .mesos domain name? Perhaps
there should be a way to integrate mesos-dns as the authoritative
zone for .mesos in your standard enterprise DNS servers. This also
saves the configuration issues of having to add DNS services to all
the nodes.  I need to research DNS a bit more, but couldn't you
setup, say in bind, that any requests in .mesos are forwarded to the
mesos-dns service, and then sent through your standard dns back to
the client?  Wouldn't this be preferable to setting the .mesos name
services as the first DNS server and then THAT forwards off to your
standard enterprise DNS servers?

Another issue I see with DNS is it works well for hostnames, but
what about ports. Yes I see there there SRV records that will return
the ports, but how would that even be used?  Consider the hive
thrift service example above.  We could assume hive thrift would run
on port 1 on all nodes in the cluster, and use the port, but
then you run into the same issues as ha proxy. You can't really
specify a port via DNS in a jdbc connection URL can you?  How do you
get applications that want to connect to a integer port do a DNS
lookup to resolve a port? Or are we back to you have one cluster,
and you get 65536 ports for all the services you could want on that
cluster? Basically hard coding the ports? This then loses
flexibility from a 

Re: [VOTE] Release Apache Mesos 0.22.0 (rc4)

2015-03-25 Thread CCAAT

(+1 :: irrelevant?)

It  (mesos-0.22.0) compiles on gentoo with:
x86_64-pc-linux-gnu-4.8.3 *

I'll be putting up the ebuild on bugs.gentoo.org, tonight.

hth,
James



On 03/25/2015 10:23 AM, Till Toenshoff wrote:

+1 binding - make check tested on:
   - OSX 10.10.3 + gcc 4.9.2
   - OSX 10.10.3 + clang 3.5
   - Ubuntu 14.04 + gcc 4.4.7



On Mar 18, 2015, at 8:52 PM, Niklas Nielsen nik...@mesosphere.io wrote:

Hi all,

Please vote on releasing the following candidate as Apache Mesos 0.22.0.




Re: [VOTE] Release Apache Mesos 0.22.0 (rc4)

2015-03-23 Thread CCAAT

Excellent start! Nice links I was not aware of (thanks).
 Folks can use distcc now to test new rollouts of mesos?
That was the quest of the thread was/is to establish some codes for
testing new rollouts of mesos.

It'll need to be 'extended' for cross compiling too for my needs.
I'd like to follow up with anyone that get's distcc working with cross 
compiling for different arches. arm64 would be really cool. (LLVM?)


My small cluster needs some work, much of it not related to mesos, but
my efforts to run mesos and spark without the HDFS and use Cephfs, btrfs 
and supporting codes. So I'm not sure when I'll get this mesos-distcc 
installed and running in the near future; but I am most interested in 
follow the issues other encounter on compiling and cross compiling on a 
mesos cluster.


What would be really cool is to run this distcc-mesos on top of spark
and cephfs for some real fast compile times of large codes.


Thanks,
James


On 03/23/2015 10:22 PM, Adam Bordelon wrote:

I know it's over a year old and hasn't been updated, but bmahler already
created a distcc framework example for Mesos.
https://github.com/mesos/mesos-distcc

On Mon, Mar 23, 2015 at 7:56 PM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

On 03/23/2015 09:02 PM, Adam Bordelon wrote:

Integration tests are definitely desired/recommended. Some of us
devs
just do make [dist]check, but others test integrations with their
favourite frameworks, or push it to their internal testing clusters.
We're open to any additional testing you would like to propose
and/or
perform.

Thanks,
-Adam-


Distcc is a distributed compiling program that has been a long staple on
Gentoo linux and many other distros. I work on Gentoo and I think that
setting up distcc to run on a mesos cluster is a fabulous idea. Not only
can you compile native binaries for the inherent arch, but cross
compiling should work too. Everyone has to recompile (optimized) kernels
frequently with the release cycle of the linux kernel team. With the
current roll out of all sorts of 64 bit arm systems, there's going to be
a great opportunity to cross compile arm64 bit codes on CISC (X86_64)
bit clusters.

This also starts the process of heterogeneous mesos clusters, surely
inevitable.

https://code.google.com/p/__distcc/ https://code.google.com/p/distcc/

https://code.google.com/p/__distcc/downloads/list
https://code.google.com/p/distcc/downloads/list

With LLvm, gnu and other projects, compiling and cross compiling on
a mesos cluster is sure to be a very, very popoular idea. Any CI
endeavor
will necessitate lots of compiling too.

hope this helps,
James








Re: [VOTE] Release Apache Mesos 0.22.0 (rc4)

2015-03-23 Thread CCAAT

On 03/23/2015 09:02 PM, Adam Bordelon wrote:


Integration tests are definitely desired/recommended. Some of us devs
just do make [dist]check, but others test integrations with their
favourite frameworks, or push it to their internal testing clusters.
We're open to any additional testing you would like to propose and/or
perform.

Thanks,
-Adam-


Distcc is a distributed compiling program that has been a long staple on
Gentoo linux and many other distros. I work on Gentoo and I think that
setting up distcc to run on a mesos cluster is a fabulous idea. Not only
can you compile native binaries for the inherent arch, but cross 
compiling should work too. Everyone has to recompile (optimized) kernels

frequently with the release cycle of the linux kernel team. With the
current roll out of all sorts of 64 bit arm systems, there's going to be
a great opportunity to cross compile arm64 bit codes on CISC (X86_64)
bit clusters.

This also starts the process of heterogeneous mesos clusters, surely 
inevitable.


https://code.google.com/p/distcc/

https://code.google.com/p/distcc/downloads/list

With LLvm, gnu and other projects, compiling and cross compiling on a 
mesos cluster is sure to be a very, very popoular idea. Any CI endeavor

will necessitate lots of compiling too.

hope this helps,
James





Estimated release date for 0.22 ?

2015-03-06 Thread CCAAT

Hello,

Best (gu)estimates on when Mesos-0.22.x will be released?


James



Re: Error when creating a new HA cluster on EC2

2015-02-22 Thread CCAAT

On 02/22/2015 06:35 AM, i...@roybos.nl wrote:


Last friday i put some Ansible scripts on github for provisoning a multi
AZ cluster on AWS.
You could have a look at it
https://github.com/roybos/aws-mesos-marathon-cluster and maybe it helps you.

It basically creates a VPC within an AWS region and setups 1 node in 3 AZ's.
All nodes are currently equal(mesos master and slave are on the same
machine) which is fine for smaller clusters for let's say 3 to 30 nodes.

Roy Bos


Very cool. I'm new to ansible, but I like what I've learns so far.
What would be cool is if some thing like this ansible example was for
just 3 (generic linux ) system that are installed  from sources. No
binaries, no distro specific packages, just pure, raw sources. That
way it would provide a baseline for installation on all sorts of linux
(embedded, vm, container, uncommon_distro, different architectures like 
arm64 etc etc) based systems.



Any ansible guides for generic mesos installs from sources would be
of keen interests to many folks.

After that in-house from sources methodology is viable, I'm quite 
certain companies will want to augment their local (in-house) cluster 
resources, with resources from vendors, particularly in a dynamic mode 
of utilization. Therefore, within the  in-house resources mixed 
(supplemented) with vendor resources is the future of cluster computing. 
ymmv.


Sadly, I see little progress on this open systems approach, and that 
concerns me greatly for the viability of the mesos cluster. Is it indeed 
going to be limited to large corporations trying to sell something 
nobody wants; or are there going to be open source methodologies 
developed that streamline the installation of mesos

inside of a medium size company with modest resources.

Ansible is the key technology here combined with building up a mesos
cluster FROM 100% SOURCES.


Put that in your presentation at mesos con(artists?)


James






Re: Optimal resource allocation

2015-02-05 Thread CCAAT

On 02/04/2015 06:00 PM, Pradeep Kiruvale wrote:


In a data center, if there are thousands of heterogeneous nodes
(x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
resources for any incoming application to finish the task faster?



Thanks  Regards,
Pradeep


Hello Pradeep,

We have similar interest, only I'm working on a small scale to prototype
( 50 systems). I think we need a special work group to address
GPU, aarch64 (arm 64 bit) and FPGA codes that could  work with mesos.
I think custom frameworks, that are problem-domain specific is the 
current best approach for code development on specialized processors.



Drop me some private email or post to a new thread on specific hardware
you have in mind and what you think is the path forward. I have an 
extensive background in embedded linux on a myriad of processors. I do
think a sVinit centric solution is the best path forward. Look at a 
previous thread: cluster wide init. My mesos on gentoo seems to be 
working fine, using openrc, but I've not completed testing deeply of the

features and other codes on top of mesos listed here: [1].

Therefore, I'd like to see mesos robustly support both systemd and 
sVinit (openrc is my choice, as I use gentoo linux) because the vast 
majority of embedded linux systems use sVinit or even more primitive

mechanisms to achieve a minimal footprint of the executable binaries.


hth,
James

[1] https://github.com/dharmeshkakadia/awesome-mesos





Re: cluster wide init

2015-01-22 Thread CCAAT

On 01/21/2015 11:10 PM, Shuai Lin wrote:

OK, I'll take a look at the debian package.

thanks,
James





You can always write the init wrapper scripts for marathon. There is an
official debian package, which you can find in mesos's apt repo.

On Thu, Jan 22, 2015 at 4:20 AM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

Hello all,

I was reading about Marathon: Marathon scheduler processes were
started outside of Mesos using init, upstart, or a similar tool [1]

This means

So my related questions are

Does Marathon work with mesos + Openrc as the init system?

Are there any other frameworks that work with Mesos + Openrc?


James



[1] http://mesosphere.github.io/__marathon/
http://mesosphere.github.io/marathon/






cluster wide init

2015-01-21 Thread CCAAT

Hello all,

I was reading about Marathon: Marathon scheduler processes were started 
outside of Mesos using init, upstart, or a similar tool [1]

So my related questions are

Does Marathon work with mesos + Openrc as the init system?

Are there any other frameworks that work with Mesos + Openrc?


James



[1] http://mesosphere.github.io/marathon/


Re: mesos and coreos?

2015-01-18 Thread CCAAT

On 01/18/2015 04:25 PM, Ranjib Dey wrote:

you are right, OS is same , which is Linux kernerl, But the
Ubuntu/CoreOS/Redhat etc distinction are in userspace (i.e tools other
than the kernel), and hence you can have coreos running ubuntu/redhat
containers.




CoreOS is a gentoo knock_off [1,2.3]

I have not explored CoreOS yet, but I have not found any reason why any
of the common linux distros cannot run on top of CoreOS, including 
gentoo or even a mixture of different linux distros.




You cant have container specific kernels, drivers, time
susbsytem etc. But you can certainly have different distros (redhat,
ubuntu etc are different distro, not OS).


So if a minimal kernel is used with coreOS the distro inside of a 
particular container cannot have different loadable modules inside

of different containers?



CoreOS eases management of container , an immutable  minimal rootfs ,
backed by tools (etcd, systemd, fleet, flannel) etc that facilitates
building large scale systems. For example, etcd is almost a replacement
of zookeeper (you can use it for leader election, distributed locks
etc). Fleet is a distributed init system.



I thought CoreOS used Systemd? [4]

I really which CoreOS was using openrc, and systemd could be used inside
of selective containers with different linux distros.



CoreOS does not provide a sceduler, which mesos does. Also coreos is not
a resource scheduling system, which mesos is. You have containarize
things to run on coreos (currently its docker, i think it will rocket in
future). While thats not a mandate for mesos.



Neither CoreOS not Mesos gives you `distributed systems`, you can
distribute your workload using mesos or coreos (mesos will autoschedule
things for you). Generally the word `distributed systems` used to
describe things like zookeeper, etcd, cassandra, riak, serf etc, where
the members are aware of each other, without any external components.
Most of them also uses  sound theoretical foundations like paxos, raft
etc for attaining different types of consistency, partition tolerance etc.

Mesos and CoreOS address orthogonal issues, and they can definitely
complement each other. CoreOS eases updating kernel, manageing app
deployments due to host OS and app separation. While mesos eases scaling
and usage issues by autoscheduling. Mesos can use coreos for its
containment layer (docker/rocket), as well as use etcd (from coreos) to
do the leader election bit instead of zookeeper (which is pain to run
over WAN, pain to dynamically resize etc). But there are major work
involve.


It will be interesting to see how all of these and other possibilities
mature. What about mesos+spark on top of a coreOS infrastructure. Anyone
has any experience with Apache_spark running on coreOS?



regards
ranjib


James



[1] 
https://coreos.com/docs/sdk-distributors/sdk/building-development-images/#updating-portage-stable-ebuilds-from-gentoo


[2] https://github.com/coreos/coreos-overlay

[3] https://github.com/coreos/portage-stable

[4] https://coreos.com/using-coreos/systemd/


Re: Do i really need HDFS?

2014-10-22 Thread CCAAT

Ok so,

I'd be curious to know your final architecture (D. Davies)?

I was looking to put Ceph on top of the (3) btrfs nodes in case we need 
a DFS at some later point. We're not really sure what softwares will be

in our final mix. Certainly installing Ceph does not hurt anything (?);
and I'm not sure we want to use ceph from userspace only. We have had
excellent success using btrfs, so that is firm for us, short of some
gapping problem emerging. Growing the cluster size will happen, once
we establish the basic functionality of the cluster.

Right now, there is a focus on subsurface fluid simulations for carbon 
sequsttration, but also using the cluster for general (cron-chronos) 
batch jobs is a secondary appeal to us. So, I guess my question is, 
knowing that we want to avoid the hdfs/hadoop setup entirely, will 
localFS/DFS with btrfs/ceph be sufficiently  robust  to test not only 
mesos+spark but many other related softwares, such as but not limited to 
R, scala, sparkR, database(sql) and many other softwares? We're just 
trying to avoid some common mistakes as we move forward with mesos.


James



On 10/22/14 02:29, Dick Davies wrote:

Be interested to know what that is, if you don't mind sharing.

We're thinking of deploying a Ceph cluster for another project anyway,
it seems to remove some of the chokepoints/points of failure HDFS suffers from
but I've no idea how well it can interoperate with the usual HDFS clients
(Spark in my particular case but I'm trying to keep this general).

On 21 October 2014 13:16, David Greenberg dsg123456...@gmail.com wrote:

We use spark without HDFS--in our case, we just use ansible to copy the
spark executors onto all hosts at the same path. We also load and store our
spark data from non-HDFS sources.

On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies d...@hellooperator.net wrote:


I think Spark needs a way to send jobs to/from the workers - the Spark
distro itself
will pull down the executor ok, but in my (very basic) tests I got
stuck without HDFS.

So basically it depends on the framework. I think in Sparks case they
assume most
users are migrating from an existing Hadoop deployment, so HDFS is
sort of assumed.


On 20 October 2014 23:18, CCAAT cc...@tampabay.rr.com wrote:

On 10/20/14 11:46, Steven Schlansker wrote:



We are running Mesos entirely without HDFS with no problems.  We use
Docker to distribute our
application to slave nodes, and keep no state on individual nodes.




Background: I'm building up a 3 node cluster to run mesos and spark. No
legacy Hadoop needed or wanted. I am using btrfs for the local file
system,
with (2) drives set up for raid1 on each system.

So you  are suggesting that I can install mesos + spark + docker
and not a DFS on these (3) machines?


Will I need any other softwares? My application is a geophysical
fluid simulator, so scala, R, and all sorts of advanced math will
be required on the cluster for the Finite Element Methods.


James











Re: CGroup Per-Task Isolation

2014-10-13 Thread CCAAT

On 10/13/14 00:36, Vinod Kone wrote:

No. It wasn't.



I'm no systemd expert, but I do not think you can implement this
if your linux distro is running systemd? If it can be, I sure like
some information on just how the scheme works. A white paper or
well defined pseudo code?




On Sun, Oct 12, 2014 at 10:07 PM, Sammy Steele
sammy_ste...@stanford.edu mailto:sammy_ste...@stanford.edu wrote:



I found this post from last year discussing implementing per task
cgroup isolation: https://issues.apache.org/jira/browse/MESOS-539.
Has this ever been implemented?


Yes, this would be a well-defined traditional approach to managing per 
task. However most distros have moved to systemd. Gentoo supports both

systemd and openrc. Openrc works very well with traditionally defined
cgroups.   Let me know if you find some code developed along these 
lines; I'll try to integrate it into my openrc test cluster (2-3) machines.




James





Re: Multiple Network interfaces

2014-10-08 Thread CCAAT

H,

Possible solution?  Attach a computer with multiple ethernet cards.
One is used to interface to the slave via the single port. ON the
attached computer (basically a secure router) you run Network Address 
Translation) [1] and other codes to make the multiple interfaces 
available on different IP and ports.



In fact, I'm not sure that you could not do this directly on the slave 
itself.   I have not examined the mesos code, but this is pretty 
standard when multiple IP interfaces are desired.


In fact there are many ways to set up multiple IP address on the same
(physical) ethernet card:  IP aliasisng [2]. Surely there are more?
[3].


A bit more detail on exactly what you are trying to accomplish, might
help in finding the correct solution, or a feature request in some 
detail for the developers to consider?


hth,
James


[1]http://en.wikipedia.org/wiki/Network_address_translation

[2] 
http://www.tecmint.com/create-multiple-ip-addresses-to-one-single-network-interface/


[3] https://docs.docker.com/articles/networking/



On 10/08/14 10:52, Jay Buffington wrote:

My reading of the code is that this is not supported.

I have this same problem.  I'm trying to work around an issue with a
stubborn application that requires that all instances in a cluster run
on the same port.  Therefore, I have 5+ interfaces per slave and I
want task A and task B to bind to the same port on different
interfaces on the same slave.

Since the ports resource is simply a Range[1] there is nowhere to
stick the IP they belong to.  I think to get what you want mesos would
need to introduce a new first class IP resource in the C++ Resources
abstraction[2] which has an IP (Scalar) and a list of ports (Range).

I opened https://issues.apache.org/jira/browse/MESOS-1874 to track the
work that needs to be done to support this.

Jay

[1] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L335
[2] https://github.com/apache/mesos/blob/master/src/common/resources.cpp

On Tue, Oct 7, 2014 at 5:09 PM, Diptanu Choudhury dipta...@gmail.com wrote:

Hi,

I am wondering if Mesos can offer resources from multiple network
interfaces? We would like to attach multiple Network Interfaces on EC2
instances and would like to bind specific applications that we run on mesos
on specific interfaces?

So basically I am wondering if Mesos can offer ports from different network
interfaces on the same slave??

--
Thanks,
Diptanu Choudhury
Web - www.linkedin.com/in/diptanu
Twitter - @diptanu






Re: Problems with OOM

2014-10-07 Thread CCAAT

On 10/07/14 06:50, Stephan Erb wrote:

Seems like there is a workaround: I can emulate my desired configuration
to prevent swap usage, by disabling swap on the host and starting the
slave without --cgroups_limit_swap. Then everything works as expected,
i.e., a misbehaving task is killed immediately.

However, I still don't know why 'cgroups_limit_swap' is not working as
advertised.

Best Regards,
Stephan



Stephan,

I do not think that anyone has mastered systemd and is fully happy with 
all of the low level capabilities it promises to master. It is a work in 
progress. It is *huge* and much is not documented. Now we're running 
clustering software (mesos) on these new systems? It's a needle in the 
haystack when memory issues are deeply rooted. How do you know they are 
deeply rooted? Easy, when you cannot find a simple solution. I use 
Gentoo for this work, because my intention is to build up both openrc 
and systemd mesos clusters, to ferret out deep memory issues. I sure 
hope others (developers?) have methodologies planned for deep memory 
issue data-collection, analysis, testing and resolution.  I think many 
of the dev-folks are holding those cards, close to their chest. I'm a 
bit more open, older, and doubting that systemd is so wonderful, in it's 
current offering. I salute those brave souls that have swallowed the 
systemd theory and wish them all the best and great success.


Me, I'm old and crusted and depend on the old traditional ways whilst
I wait for systemd to mature. Either way, you are going to  need tools
such as ftrace/trace-cmd/kernelshark and some very tuned kernels to
push the capabilities of mesos, imho. So until I get my clusters built
and accepting batch jobs, I cannot really help you out.

Systemtap, dtrace, vlagrind, etc etc are tools that may help. I'm still 
trying to get kernelshark working on gentoo linux. I wish I could be of 
more help to you. I think it would be an excellent idea if folks would 
include their  platform (OS, kernel, mesos-version, spark-version etc 
etc) in their postings. For me, I'm working on too many things in 
parallel in order to get thse mesos-spark clusters ready to bang on a 
bit. I'm not much for just downloading and running a bunch of binaries 
and tweaking a few config files. In my decades of experiences with 
embedded systems, high_strung mathematics and distributed processing, a 
bunch of binaries will simply not work when you run into deep problems 
like (OOM). It's going to take building up from 100% sourcecodes and 
diagnosing these problems all along the way. OOM for an in-memory 
distributed system is just one of the deep, kernel related problems we 
are going to face, imho. You may/will exhaust user space remedies when 
the real issues are deeply related to systemd and the low level kernel 
resource allocations decisions that have been abstracted away into 
systemd. Anything as complex as systemd is going to take years to become

stable and decades to master and then document, imho.


Certainly, I hope I'm very, very wrong.   When somebody builds a mesos 
cluster, and runs a (10K)^3 cell array with PDE/FEM codes on a mesos 
cluster, please let me know, so I can download your binaries? When your 
mesos-cluster is running batch jobs of most-any commonly found linux 
applications, please drop the list some fan-mail.



WE need deep_tools, and this community should share what tools they have 
as these problems are worked through, imho.



hth,
James


Re: Problems with OOM

2014-10-07 Thread CCAAT

On 10/07/14 06:50, Stephan Erb wrote:

Seems like there is a workaround: I can emulate my desired configuration
to prevent swap usage, by disabling swap on the host and starting the
slave without --cgroups_limit_swap. Then everything works as expected,
i.e., a misbehaving task is killed immediately.

However, I still don't know why 'cgroups_limit_swap' is not working as
advertised.



Kernelshark is your friend


IMHO, you should google and read a bit on OOM, OOM-killer and systemd 
and cgroups.


OOPs, I forgot a couple of links to get you started on systemd research, 
(OOM), cgroups etc etc issues:


https://www.kernel.org/doc/Documentation/cgroups/memory.txt

https://wiki.archlinux.org/index.php/cgroups (historical ref).


hth,
James




Archive of this list

2014-09-29 Thread CCAAT

Hello,

Is there an archive for this list?

Tia,
James



Re: Problems with OOM

2014-09-27 Thread CCAAT

Hello one and all,

From my research, the most significant point to using mesos,
is to use container in lieu of a VM configuration [1].
I'd be curious as to informative points that illuminate this
issue. I guess the main point is that for mesos to be all it can be
were talking about containers on bare metal?



Also, kernelshark is available in debian and most major linux OS 
distros. It can be useful to track down all sorts of problems; ymmv.


curiously
James

[1] http://openstacksv.com/2014/09/02/make-no-small-plans/



On 09/26/14 10:45, Stephan Erb wrote:

@Tomas: I am currently only running a single slave in a VM. It uses the
isolator and the logs are clean.
@Tom: Thanks for the interesting hint! I will look into it.

Best Regards,
Stephan

On Fr 26 Sep 2014 16:53:22 CEST, Tom Arnfeld wrote:

I'm not sure if this at all related to the issue you're seeing, but we
ran into this fun issue (or at least this seems to be the cause)
helpfully documented on this blog article:
http://blog.nitrous.io/2014/03/10/stability-and-a-linux-oom-killer-bug.html.


TLDR: OOM killer getting into an infinite loop, causing the CPU to
spin out of control on our VMs.

More details in this commit message to the OOM killer earlier this
year;
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0c740d0afc3bff0a097ad03a1c8df92757516f5c


Hope this helps somewhat...

On 26 September 2014 14:15, Tomas Barton barton.to...@gmail.com
mailto:barton.to...@gmail.com wrote:

Just to make sure, all slaves are running with:

--isolation='cgroups/cpu,cgroups/mem'

Is there something suspicious in mesos slave logs?

On 26 September 2014 13:20, Stephan Erb
stephan@blue-yonder.com mailto:stephan@blue-yonder.com
wrote:

Hi everyone,

I am having issues with the cgroups isolation of Mesos. It
seems like tasks are prevented from allocating more memory
than their limit. However, they are never killed.

  * My scheduled task allocates memory in a tight loop.
According to 'ps', once its memory requirements are
exceeded it is not killed, but ends up in the state D
(uninterruptible sleep (usually IO)).
  * The task is still considered running by Mesos.
  * There is no indication of an OOM in dmesg.
  * There is neither an OOM notice nor any other output
related to the task in the slave log.
  * According to htop, the system load is increased with a
significant portion of CPU time spend within the kernel.
Commonly the load is so high that all zookeeper
connections time out.

I am running Aurora and Mesos 0.20.1 using the cgroups
isolation on Debian 7 (kernel 3.2.60-1+deb7u3). .

Sorry for the somewhat unspecific error description. Still,
anyone an idea what might be wrong here?

Thanks and Best Regards,
Stephan










Re: Problems with OOM

2014-09-27 Thread CCAAT

On 09/26/14 06:20, Stephan Erb wrote:

Hi everyone,

I am having issues with the cgroups isolation of Mesos. It seems like
tasks are prevented from allocating more memory than their limit.
However, they are never killed.



I am running Aurora and Mesos 0.20.1 using the cgroups isolation on
Debian 7 (kernel 3.2.60-1+deb7u3). .



Maybe a newer kernel might help?  I've poked around for some suggestions 
on the  kernel-configuration file for servers running mesos, but nobody 
is talking about how they tweak their kernel settings, yet.


Here's a good article on default shared memory limits:
[1]http://lwn.net/Articles/595638/


Also, I'm not sure if OOM-Killer works on kernel space problems
where memory is grabbed up continuously by the kernel. That may
not even be your problem. I know OOM-killer works on userspace
memory problems.

Kernelshark is your friend

hth,
James




Re: Build on Amazon Linux

2014-09-25 Thread CCAAT

On 09/25/14 10:33, John Mickey wrote:


The default is posix/cpu,posix/mem



Any ideas why it is still trying to use cgroups?


Perhaps this short posting my help a bit?
http://blog.jorgenschaefer.de/2014/07/why-systemd.html

Short answer, systemd is controlling cgroups now, and it is
a huge, monolith of software vs the traditional init systems.

I run gentoo system, with openrc in lieu of systemd. Most all
other distros have or are moving to systemd. There is lots published
on systemd.



hth,
James




Re: Build on Amazon Linux

2014-09-24 Thread CCAAT

On 09/24/14 13:23, John Mickey wrote:

Thank you for the responses.

I replaced OpenJDK with Oracle JDK and was able to build successfully.
During make check, I received the following error:

F0924 18:12:05.325278 13960 isolator_tests.cpp:136]
CHECK_SOME(isolator): Failed to create isolator: Failed to mount
cgroups hierarchy at '/sys/fs/cgroup/cpu': Failed to create directory
'/sys/fs/cgroup/cpu': No such file or directory



Sounds like a systemd problem. If you have access to
Ftrace/trace-cmd/KernelShark it may help you find the
problem, but really a WAG. (Wile Arse Guess)..

James




Re: spark and mesos issue

2014-09-15 Thread CCAAT

Hello Brenden/Vinod,

Is your installation using systemd ?

Has anyone documented systemd configurations/issues for the various 
linux distro running mesos/spark?


What if a cluster is running on a mixture of systems that use/do_not_use
systemd; are there any issues, related to systemd and mesos/spark?

Has anyone tried to use Ftrace/trace-cmd/kernelshark in tracing down
or optimizations of the linux kernel for machines dedicated to
mesos/spark?

Are there  (kernel) .config files published for key kernel resources 
dedicated to the optimization of mesos/spark anywhere ?



curiously,
James




On 09/15/14 16:13, Brenden Matthews wrote:

I started hitting a similar problem, and it seems to be related to
memory overhead and tasks getting OOM killed.  I filed a ticket here:

https://issues.apache.org/jira/browse/SPARK-3535

On Wed, Jul 16, 2014 at 5:27 AM, Ray Rodriguez rayrod2...@gmail.com
mailto:rayrod2...@gmail.com wrote:

I'll set some time aside today to gather and post some logs and
details about this issue from our end.


On Wed, Jul 16, 2014 at 2:05 AM, Vinod Kone vinodk...@gmail.com
mailto:vinodk...@gmail.com wrote:




On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone vi...@twitter.com
mailto:vi...@twitter.com wrote:


On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh
gurvinder.si...@uninett.no
mailto:gurvinder.si...@uninett.no wrote:

ERROR storage.BlockManagerMasterActor: Got two different
block manager
registrations on 201407031041-1227224054-5050-24004-0

Googling about it seems that mesos is starting slaves at
the same time
and giving them the same id. So may bug in mesos ?


Has this issue been resolved? We need more information to
triage this. Maybe some logs that show the lifecycle of the
duplicate instances?


@vinodkone








bebug/Testing mesos-0.20.0

2014-09-09 Thread CCAAT

Hello,

I've just created ebuild for mesos-0.20.0 for gentoo. Gentoo's ebuilds
handle the build and runtime setting for software packages on gentoo.
I need to test the new mesos builds to ensure all of the compile time
dependencies are correct and that each runtime dependency option works
correctly.

Looking here [1]: I see :
# Start mesos slave.
 $ ./bin/mesos-slave.sh --master=127.0.0.1:5050
and
# Run C++ framework (***Exits after successfully running some tasks.***).
 $ ./src/test-framework --master=127.0.0.1:5050

The test-framework is not installed (don't know why)?



But I do have these in /usr/bin:

mesos  mesos-execute  mesos-log  mesos-resolve  mesos-tail
mesos-cat  mesos-localmesos-ps   mesos-scp

In /usr/sbin:

mesos-daemon.sh  mesos-start-cluster.sh  mesos-stop-cluster.sh
mesos-master mesos-start-masters.sh  mesos-stop-masters.sh
mesos-slave  mesos-start-slaves.sh   mesos-stop-slaves.sh

and
/usr/include/stout/tests/utils.hpp

Also I notice some *.html files generated (for example):
snip
/usr/share/mesos/webui/master/static/index.html
/usr/share/mesos/webui/master/static/frameworks.html

If these html files are used for starting stopping and monitoring
mesos, could someone point me to the docs for this, or provide guidance?


Any suggestions on how to best debug/fix the imesos-0.20.0 install and 
test it are most welcome. Is there  another package to install.




[1] http://mesos.apache.org/gettingstarted/

James


Re: Mesos on Gentoo

2014-09-08 Thread CCAAT

On 09/07/14 23:39, Vinod Kone wrote:

Hi James,

Great to see a Gentoo package for Mesos!



Regarding HDFS requirement, any shared storage (even just a http/ftp
server works) that the Mesos slaves can pull the executor from is enough.



Hello Vinod,

I'm looking for more specific advise on not only what to choose for a 
distributed File System, but some overarching guidance on why/how/where
to look to figure out the gentoo_ish path for success. I think a big 
part of the entire distributed choices is that you either download 
binaries or things are written too general to be of use. If it does not 
work, I'll work on option B, C, D..



Since I want a lightening fast computation machine, where due to lots
of cells performing the exact same complex calculations over and over 
again, I'm guessing I need a high performance, open source, file system.


Specific suggestions? Syntax (even if another distro) or pseudo_syntax
or description of the steps (caveats?) is most encouraging.

Mesos slaves can pull the executor from is enough sounds very 
enticing, but I have no clue as to the choices or how to pursue any

of those choices. My background is EE/CS/math so I have tendencies
towards assembler and C. I find the whole OO paradigm very interesting,
so an overbearing guidance would be keen?


James


Re: Mesos on Gentoo

2014-09-08 Thread CCAAT

On 09/08/14 02:55, Tomas Barton wrote:


Spark has support for HDFS, however you don't have to use it and there's
no need to install whole Hadoop stack. I've tested Mesos and Spark with
FhGFS distributed filesystem and it works just fine.


Yes, from what I have read, since this is a new effort, skip Hadoop and 
HDFS all together. I agree!. Spark (RDD) is what We're after).


Initially my dev platform is (3) AMD FX-8350 each with 32 G of ram for 
the (8) cores. Little else (only essential/test codes) will run on the 
(3) box dev cluster. They have water coolers so the freq can go up to 6 
or 7 GHz later on just to make things interesting for CPU intensive testing.


FhGFS(3)/ is my question. [1].

Why not FhGFS/BTRFS? Many of the technically astute folks using gentoo
have left XFS for BTRFS.

So pick one for me?  Argue against (C), if you can not using stability 
as an argument. The goal is ultimate performance, stability will come, 
with the passage of time, imho.



(A) fhgfs(3)/ext4
(B) fhgfs(3)/xfs
(C) fhgfs(3)/btrfs
(D) fhgfs(3)/

I'm not going to mess around with raid tuning, at this time. Besides
We hope to run 100% in ram (RDD) with using HDD writes only for long 
term storage and analytics, which can be delayed without consequence.
There is still some debate as to if raid will even be necessary on 
btrfs, but that debate, can wait.



[1] http://moo.nac.uci.edu/~hjm/fhgfs_vs_gluster.html


Tomas


James


Mesos on Gentoo

2014-09-07 Thread CCAAT

Hello Mesos,

I have hacked together an ebuild (gentoo package) to install 
mesos-0.20.0. It seems to be working, but I need some generic guidelines to

fully test the mesos package.

I also intend to install it on a small cluster  of gentoo machines. Do I 
need a distributed file system, such as HDFS for a distributed version 
of mesos? ( in other words, a mesos cluster)? If so, what are my
practical choices, CEPH, BTRFS, Gluster or is HDFS required. If 
possible, I'm trying to skip over the entire Hadoop environment, as I 
have not legacy interests and it seems to not efficient for what's at 
the heart of my needs (computations and visual simulations using RDD).


On gentoo, use of binaries is only temporary (transient) until the 
sources can be properly assimilated for compiling and installation 
control via an ebuild [1]. So I have built the openjdk  stack on gentoo, 
know as icedtea. icedtea actually uses open source software tools to 
build up the java-jdk from sources. [5]



Eventually, I intend to install Spark, Cassandra, a distributed database 
(HyperTable?) and some analytics tools. I hope to use spark on mesos to 
solve some very large Finite Element problems [2,[3,4]


So, any other component/supporting codes for this sort of big science 
adventure,  mesos+spark is of the utmost interest to me.


I'm an old unix/bsd/linux hack but some of these newer codes
take me a while to find and figure out the compile-time and run-time 
dependencies. When I get all of these (ebuild) modules tested, I'll post 
the ebuilds (as Overlays) if anyone is interested.


Your guidance and suggestions are most welcome!


James



[1] http://en.wikipedia.org/wiki/Ebuild

[2] http://en.wikipedia.org/wiki/Finite_element_method

[3] http://www.dune-project.org/

[4] http://www.mcs.anl.gov/petsc/

[5] http://icedtea.classpath.org/wiki/Main_Page