Re: marathon (or java) container contantly oom

2020-08-26 Thread Tomas Barton
Hi,

it is a known issue with Marathon:

https://jira.d2iq.com/browse/MARATHON-8180

AFAIK it hasn't been fixed yet. You can tune GC or increase memory limits,
but the memory usage will grow indefinitely with a higher number of tasks.

Regards,
Tomas

On Wed, 26 Aug 2020 at 11:11, Marc Roos  wrote:

>
> Recently I enabled the cpu and memory isolators on my test cluster. And
> since then I have been seeing the marathon containers (when becoming
> leader) increase memory usage from ~400MB until they oom at 850MB
> (checking vi systemd-cgtop).
>
> Now I am testing with these settings from this page[1]
>
> JAVA_OPTS "-Xshare:off -XX:+UseSerialGC -XX:+TieredCompilation
> -XX:TieredStopAtLevel=1 -Xint -XX:+UnlockExperimentalVMOptions
> -XX:+UseJVMCICompiler"
> LD_PRELOAD "/usr/lib64/libjemalloc.so.1"
>
> Is someone able to share an efficient config? Or is it not possible to
> get marathon running below 1GB? At the moment I have only ~10 tasks.
>
> [1]
>
> https://stackoverflow.com/questions/53451103/java-using-much-more-memory-than-heap-size-or-size-correctly-docker-memory-limi
>
>
>


Re: Zookeeper or Marathon issue?

2017-11-22 Thread Tomas Barton
Hi Alex,

looks like you've restarted Marathon during election. Try to backup
ZooKeeper data and then go to exhibitor / ZooKeeper CLI and remove flag
from Marathon namespace:

/state/migration-in-progress

According to https://github.com/mesosphere/marathon/pull/5662 the flag
should be removed upon unsuccessful migration. Which version of Marathon do
you run?

Tomas

On 22 November 2017 at 08:21, Alex Evonosky  wrote:

> Hello group-
>
> Long time Mesos user and first time post about an issue.  I have been
> running mesos 1.4.0 for a while without any issues.  The other day, Ubuntu
> upgraded mesos to 1.4.1, which seemed to go ok, however, I reloaded one
> master node to verify it can come back up after the upgrade and now mesos
> and zookeeper appear fine, however, marathon did not recover.  Starting
> marathon shows many errors now (attached file).
>
> Could someone let me know what the issue could be from just rebooting a
> server?
>
>
> Thank you!
>
> Alex
>


Re: cron-like scheduling in mesos framework?

2017-01-06 Thread Tomas Barton
Hi,

try Chronos framework https://mesos.github.io/chronos/

Tomas

On 6 January 2017 at 17:40, l vic  wrote:

> Hi,
> Is there a way to schedule mesos framework task for execution at certain
> day/time?
> Thank youm
> -V
>


Re: Mesos CLI

2016-06-29 Thread Tomas Barton
Hi,

we're using `mesos tail -f`, `mesos cat`, `mesos ps` quite a lot. It would
be nice if you keep those. Is there a document describing changes that
you're planning?

just few ideas, it would be nice to have a command for:
  - listing all instances of a given task
  - join stderr and stdout (for stderr and stdout)
  - list failed tasks (or LOST, etc.)
  - grep task ID in `mesos ps`

Regards,
Tomas

On 17 June 2016 at 23:20, Haris Choudhary  wrote:

> Thanks for the correction Jie!
> We will have a new cluster plugin that will still incorporate the features
> of these commands.
>
> On Fri, Jun 17, 2016 at 2:09 PM, Jie Yu  wrote:
>
>> Haris, i think you meant to make a backward incompatible change to the
>> existing commands, not removing them, right? For instance:
>>
>> mesos ps -> mesos cluster ps
>> mesos cat -> mesos cluster cat
>>
>> I guess the question is: is there anyone use those tools in production
>> and their tooling depends on those command.
>>
>> If not, it'll be much more easily for us to just make this backwards
>> incompatible change.
>>
>> June, Ben, can you let us know? Thanks!
>>
>> - Jie
>>
>> On Fri, Jun 17, 2016 at 1:57 PM, Ben Whaley  wrote:
>>
>>> We use mesos ps and would in fact love to see improvements to the CLI
>>> offerings, not reductions.
>>>
>>> Agreed. We use all the commands mentioned below.
>>>
>>> On Jun 17, 2016, at 1:03 PM, June Taylor  wrote:
>>>
>>> We use mesos ps and would in fact love to see improvements to the CLI
>>> offerings, not reductions.
>>>
>>>
>>> Thanks,
>>> June Taylor
>>> System Administrator, Minnesota Population Center
>>> University of Minnesota
>>>
>>> On Fri, Jun 17, 2016 at 12:11 PM, Haris Choudhary <
>>> hchoudh...@mesosphere.io> wrote:
>>>
 Hey All,

 The Mesos CLI is going through a redesign. We are aware that the
 "mesos-execute" command is used pretty often, so that will be ported into
 the new CLI. However we're not sure if any of the other current CLI
 commands are being used at all. The remaining list of commands are as
 follow:
 - cat
 - ps
 - tail
 - scp

 If anyone is still using them, please let us know. *If a command is
 not being used it may be removed completely without a deprecation notice. *

 Thanks!

>>>
>>>
>>
>


Re: Issues on Zk configuration in Marathon

2016-02-04 Thread Tomas Barton
It's interesting idea, but if you use ZooKeeper for Marathon or Mesos
discovery you might easily end up in chicken-egg problem. You would like to
start Marathon, but there's no ZooKeeper available until you start Marathon.

If 3 ZooKeepers aren't enough for you, you might consider deploying
ZooKeeper in observer mode via Marathon.

http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html

On 2 February 2016 at 12:24, Sam  wrote:

> Thanks Lin Shuai
>
> Regards,
> Sam
>
> Sent from my iPhone
>
> On Feb 1, 2016, at 10:08 PM, Shuai Lin  wrote:
>
> I think you need to either pin the tasks to some of the slaves (e.g. using
> marathon "CLUSTER"
> 
> constraint) so that you can have a static configuration for your zk
> instances, or you need some type of service discovery.
>
> On Mon, Feb 1, 2016 at 9:09 PM, Sam  wrote:
>
>> Hello guys
>> One quick question in Marathon with Mesos,
>> We are trying to deploy Zk with Marathon to make sure that Zk is always
>> available no matter one of nodes crashed. For example : we got Zk1,Zk2 and
>> Zk3, Zk1 need to have IP address of Zk2 and Zk3; Zk2 need to have IP
>> address of Zk1 and Zk3 , same to Zk3 .  The issue is when one of them
>> crashed , and Marathon spin up new Zk, how to have old  IP address
>> configuration set into new instance ? I think this is issue to all App
>> cluster that need to have each other configuration respectively.
>> Looking forward to having solution to get it done . Appreciate
>>
>> Regards,
>> Sam
>>
>> Sent from my iPhone
>
>
>


Re: MesosExecutorDriver hard dependency on libcurl4-nss-dev in 0.23.0

2015-08-15 Thread Tomas Barton
Hi,

libmesos.so will be dependent on the libcurl-dev library that is present on
the machine where you've built the package.

So, in case of Debian, libmesos will be dependent on one of these packages:

  libcurl4-nss-dev,libcurl4-gnutls-dev, libcurl4-openssl-dev

It's not a new dependency, same applies to all 0.2x releases.

Tomas

On 15 August 2015 at 04:17, haosdent haosd...@gmail.com wrote:

 Hello, libcurl-nss is provider openssl functions. In
 http://mesos.apache.org/gettingstarted/, you could see

 ```
 # Install other Mesos dependencies.
 $ sudo apt-get -y install build-essential python-dev python-boto
 libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev
 ```

 On Sat, Aug 15, 2015 at 2:08 AM, Maxim Khutornenko ma...@apache.org
 wrote:

 Hi,

 There seems to be a new runtime dependency in 0.23 MesosExecutorDriver
 that we did not have before. Importing MesosExecutorDriver from
 mesos.native against a python egg built with all default flags fails
 with the following error:

 ImportError: libcurl-nss.so.4: cannot open shared object file: No such
 file or directory

 Installing libcurl4-nss-dev on Ubuntu fixes the problem but we could
 not find any notes in the release that would highlight this new
 runtime requirement. Were there any announcements we missed? Are there
 any make flags we could use to suppress this dependency when building
 an egg?

 Thanks,
 Maxim




 --
 Best Regards,
 Haosdent Huang



Re: How does Mesos Debian packaging work?

2015-08-05 Thread Tomas Barton
Some dependencies ended up in the script historically, others are added
incrementally (like libsvn1 is required since 0.21). libunwind* could be
removed, the libcurl-dev dependency can't be removed. You can easily check
this with ldd:

$ ldd /usr/lib/libmesos-0.22.1.so

...
libapr-1.so.0 = /usr/lib/libapr-1.so.0 (0x7f35be6b6000)
libcurl-nss.so.4 = /usr/lib/x86_64-linux-gnu/libcurl-nss.so.4
(0x7f35be44f000)
libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f35be238000)
...

In this case the package was built with libcurl4-nss-dev installed, and
therefore libmesos.so requires libcurl-nss.so.4

Another dependency that is missing, is for example python 2.7 which is
somehow expected to be present on your system if you wanna use Mesos CLI
tools. This should be moved to a separate package, so that you don't have
to install python and java when you need just libmesos.so.

I've started working on a proper Debian packaging a while ago, however I
didn't have time to finish that. If you're interested in, any help would be
appreciated.


On 5 August 2015 at 02:11, Jay Taylor j...@jaytaylor.com wrote:

 I see, thank you Tomas for that updated mesos-deb-packaging link.  Helpful
 information to be sure.

 I'm am still curious about why mesosphere-hosted deb's end up with less
 dependencies than the builds done by the mesos-deb-packaging scripts.  Is
 there really no way to generate a roughly equivalent build?

 The hopeful dream I have is to be able to build (then distribute and
 install across my cluster) a `.deb` containing my own modifications.  Then
 if/when it explodes to be able to just revert back to the official
 releases; __ideally not having to install extra libraries along the way,
 change configurations, or inflict other [unnecessary] system mutations__.

 Is this asking too much of mesos/mesosphere in their present forms?

 On Tue, Aug 4, 2015 at 4:42 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 Hi Jay,

 as long as the libcurl-dev dependency is concerned you can use any of
 libcurl4-openssl-dev, libcurl4-gnutls-dev or libcurl4-nss-dev. Just make
 sure one of these libraries is installed on all of your worker (slave)
 nodes. If you don't need any modification of the mesos packaging, you can
 use pre-built packages from Mesosphere:
 http://open.mesosphere.com/downloads/mesos

 If you need some modification, you should fork
 https://github.com/mesosphere/mesos-deb-packaging which is far more
 maintained. Anyway none of these packaging is close to an ideal solution. A
 proper Debian package would distinguish between a mesos-master,
 mesos-slave, libmesos, java and python binding etc.

 Regards,
 Tomas

 On 4 August 2015 at 22:36, Jay Taylor outtat...@gmail.com wrote:

 Greetings Mesonians,

 What is the procedure for creating debian/ubuntu .deb distribution
 builds of Mesos?

 I am currently using https://github.com/deric/mesos-deb-packaging, but
 it seems to add some dependencies (libunwind* and libcurl4-nss-dev) that
 the mesosphere-hosted distributions do not, and I strongly prefer creating
 artifacts comparable to what is emitted in the official builds.

 Thanks,
 Jay






Re: How does Mesos Debian packaging work?

2015-08-04 Thread Tomas Barton
Hi Jay,

as long as the libcurl-dev dependency is concerned you can use any of
libcurl4-openssl-dev, libcurl4-gnutls-dev or libcurl4-nss-dev. Just make
sure one of these libraries is installed on all of your worker (slave)
nodes. If you don't need any modification of the mesos packaging, you can
use pre-built packages from Mesosphere:
http://open.mesosphere.com/downloads/mesos

If you need some modification, you should fork
https://github.com/mesosphere/mesos-deb-packaging which is far more
maintained. Anyway none of these packaging is close to an ideal solution. A
proper Debian package would distinguish between a mesos-master,
mesos-slave, libmesos, java and python binding etc.

Regards,
Tomas

On 4 August 2015 at 22:36, Jay Taylor outtat...@gmail.com wrote:

 Greetings Mesonians,

 What is the procedure for creating debian/ubuntu .deb distribution builds
 of Mesos?

 I am currently using https://github.com/deric/mesos-deb-packaging, but it
 seems to add some dependencies (libunwind* and libcurl4-nss-dev) that the
 mesosphere-hosted distributions do not, and I strongly prefer creating
 artifacts comparable to what is emitted in the official builds.

 Thanks,
 Jay



Re: [VOTE] Release Apache Mesos 0.22.1 (rc6)

2015-05-02 Thread Tomas Barton
+1

tested on Ubuntu 14.04 LTS and Debian 7 Wheezy with gcc 4.8.4

On 1 May 2015 at 03:51, Elizabeth Lingg elizab...@mesosphere.io wrote:

 It seemed to be different symptoms with the same cause.

 Thanks,
 Elizabeth

 On Thu, Apr 30, 2015 at 6:09 PM, Benjamin Mahler 
 benjamin.mah...@gmail.com wrote:

 Wasn't that ticket just a duplicate of
 https://issues.apache.org/jira/browse/MESOS-2601?

 On Thu, Apr 30, 2015 at 4:21 PM, Elizabeth Lingg elizab...@mesosphere.io
  wrote:

 +1, tested Mesos 2601, 2605, and 2668 in a test cluster with many
 services running.

 Adam, could you add the fix for Mesos 2605 to the release notes?

   * [MESOS-2605] - The slave sometimes does not send active executors
 upon reregistration

 Thanks,
 Elizabeth

 On Thu, Apr 30, 2015 at 8:08 AM, Nikolay Borodachev nbo...@adobe.com
 wrote:

  +1



 *From:* Adam Bordelon [mailto:a...@mesosphere.io]
 *Sent:* Thursday, April 30, 2015 2:49 AM
 *To:* dev; user@mesos.apache.org
 *Subject:* [VOTE] Release Apache Mesos 0.22.1 (rc6)



 Hi all,

 Please vote on releasing the following candidate as Apache Mesos 0.22.1.

 0.22.1 is a bug fix release and includes the following:


 

   * [MESOS-1795] - Assertion failure in state abstraction crashes JVM.
   * [MESOS-2161] - AbstractState JNI check fails for Marathon framework.
   * [MESOS-2461] - Slave should provide details on processes running in
 its cgroups
   * [MESOS-2583] - Tasks getting stuck in staging.
   * [MESOS-2592] - The sandbox directory is not chown'ed if the fetcher
 doesn't run.
   * [MESOS-2601] - Tasks are not removed after recovery from slave and
 mesos containerizer
   * [MESOS-2614] - Update name, hostname, failover_timeout, and
 webui_url in master on framework re-registration
   * [MESOS-2643] - Python scheduler driver disables implicit
 acknowledgments by default.
   * [MESOS-2668] - Slave fails to recover when there are still
 processes left in its cgroup

 The CHANGELOG for the release is available at:

 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.22.1-rc6

 

 The candidate for Mesos 0.22.1 release is available at:

 https://dist.apache.org/repos/dist/dev/mesos/0.22.1-rc6/mesos-0.22.1.tar.gz

 The tag to be voted on is 0.22.1-rc6:

 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.22.1-rc6

 The MD5 checksum of the tarball can be found at:

 https://dist.apache.org/repos/dist/dev/mesos/0.22.1-rc6/mesos-0.22.1.tar.gz.md5

 The signature of the tarball can be found at:

 https://dist.apache.org/repos/dist/dev/mesos/0.22.1-rc6/mesos-0.22.1.tar.gz.asc

 The PGP key used to sign the release is here:
 https://dist.apache.org/repos/dist/release/mesos/KEYS

 The JAR is up in Maven in a staging repository here:
 https://repository.apache.org/content/repositories/orgapachemesos-1054

 Please vote on releasing this package as Apache Mesos 0.22.1!

 The vote is open until Mon May 4 18:00:00 PDT 2015 and passes if a
 majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Mesos 0.22.1
 [ ] -1 Do not release this package because ...

 Thanks,

 -Adam-







Re: Storm on Mesos - 3 Masters

2015-02-06 Thread Tomas Barton
Hi,

sorry for late reply. I found the message accidentally in spam.

It seems like Storm is binding to localhost 127.0.1.1:52310
http://scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310/
instead
of using public interface.

Regards,
Tomas


On 19 January 2015 at 14:04, Ondrej Smola ondrej.sm...@gmail.com wrote:

 Hi,

 we have Mesos cluster installation - 3 masters (0.21.0), ZK (3.4.5)
 running Mesos, Spark, Chronos, Marathon and Storm 0.9.3. All nodes running
 Ubuntu 14.04.

 My problem is that i have to start MesosNimbus on currently elected
 leader, otherwise MesosNimbus get stuck. From log i see it detects
 currently leading master correctly but then get stuck. When leader changes
 to node running nimbus it works again.

 nimbus upstrart.log

 I0119 12:20:03.289799 10728 detector.cpp:433] A new leading master (UPID=
 master@192.168.56.11:5050) is detected
 I0119 12:20:03.290081 10733 sched.cpp:234] New master detected at
 master@192.168.56.11:5050
 I0119 12:20:03.290592 10733 sched.cpp:242] No credentials provided.
 Attempting to register without authentication

 nimbus.log

 2015-01-19T12:15:40.478+0100 o.m.log [DEBUG] started Server@20e1ceb3
 2015-01-19T12:15:40.478+0100 s.m.MesosNimbus [INFO] Started serving config
 dir under http://192.168.56.10:49202/conf
 2015-01-19T12:15:40.535+0100 s.m.MesosNimbus [INFO] Waiting for scheduler
 to initialize...

 On leading mesos i see following log (repeated every second)

 mesos.log

 I0119 12:40:53.208027  4957 master.cpp:1520] Received re-registration
 request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
 I0119 12:40:53.208860  4957 master.cpp:1573] Re-registering framework
 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3)  at
 scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
 I0119 12:40:53.209205  4957 master.cpp:1602] Framework
 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
 scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 failed over
 I0119 12:40:53.211552  4957 hierarchical_allocator_process.hpp:375]
 Activated framework 20150119-114412-171485376-5050-6660-0002
 I0119 12:40:53.211932  4959 master.cpp:789] Framework
 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
 scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
 disconnected
 I0119 12:40:53.212004  4959 master.cpp:1752] Disconnecting framework
 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
 scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
 I0119 12:40:53.212198  4959 master.cpp:1768] Deactivating framework
 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
 scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
 I0119 12:40:53.212446  4959 master.cpp:811] Giving framework
 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
 scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 1hrs to
 failover
 I0119 12:40:53.212550  4959 hierarchical_allocator_process.hpp:405]
 Deactivated framework 20150119-114412-171485376-5050-6660-0002
 I0119 12:40:54.209858  4959 master.cpp:1520] Received re-registration
 request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310


 Other frameworks works okay and handles leading masters on another node
 correctly.
 From breef look at source code it hangs

 https://github.com/mesos/storm/blob/master/src/storm/mesos/MesosNimbus.java
 at line 153

 when trying to acquire semaphore.


 Thank you for your great job

 Ondrej Smola



Re: libsubversion-1 is required for mesos to build.

2015-01-29 Thread Tomas Barton
apt-get install libapr1-dev libsvn-dev

should fix that. (for this error just libsvn-dev is enough)

Tomas

On 29 January 2015 at 22:35, Dan Dong dongda...@gmail.com wrote:

 Hi,
   When I tried to build mesos-0.21.0 on Ubuntu-14.04, I get this error:
 ---
 libsubversion-1 is required for mesos to build.
 ---

 I did quite some google but could not figure out which package to install
 to resolve it.  Package of subversion is installed but does not help here.

 Cheers,
 Dan




Re: Accessing stdout/stderr of a task programmattically?

2015-01-13 Thread Tomas Barton
Have a look at mesos.cli:

https://pypi.python.org/pypi/mesos.cli/0.1.3

you can easily do

mesos tail {task id}

and access log file on any slave machine or connect to machine where is
given task running.


On 13 January 2015 at 22:48, David Greenberg dsg123456...@gmail.com wrote:

 I was trying to figure out how to programmatically access a task's stdout
  stderr, and I don't fully understand how the URL is constructed. It seems
 to be of the form http://
 $slave_url:5050/read.json?$work_dir/work/slaves/$slave_id/frameworks/$framework_id/executors/$executor_id/runs/$something

 What is the $something? Is there an easier way, given just the task_id, to
 find where the output is?

 Thanks,
 David



Re: Recommended resources for master / scheduler machines

2015-01-08 Thread Tomas Barton
Is ZooKeeper running in distributed mode?

ZooKeeper is writes periodically all data to disk (transaction log), so the
bottleneck could be ZooKeeper rather than
not enough CPUs. ZooKeeper limits each key to 1MB, typically 512MB should
be enough for ZooKeeper (or 4GB
might not be enough, depends on your use-case).

from ZooKeeper docs:

ZooKeeper's transaction log must be on a dedicated device. (A dedicated
partition is not enough.) ZooKeeper writes the log sequentially, without
seeking Sharing your log device with other processes can cause seeks and
contention, which in turn can cause multi-second delays.

 In particular, you should not create a situation in which ZooKeeper swaps
to disk. The disk is death to ZooKeeper. Everything is ordered, so if
processing one request swaps the disk, all other queued requests will
probably do the same. the disk. DON'T SWAP.


On 8 January 2015 at 16:47, Itamar Ostricher ita...@yowza3d.com wrote:

 Thanks Tomas.

 We're still quite far from the 10k-20k machines limit :-)

 Currently, our framework scheduler generates many (millions) of mostly
 small tasks (some in the ~100ms, some in the few seconds).
 I understand that the network is the main bottleneck, but we sometimes
 experience lost tasks, and sometimes I see master logs indicating that the
 master is unable to talk with the zookeeper service (which is on the same
 host), and I was wondering if it's related to CPU/RAM of the master machine.
 Is 1 CPU enough? 2? 4?
 1GiB RAM? 4? 8?

 On Thu, Jan 8, 2015 at 5:00 PM, Tomas Barton barton.to...@gmail.com
 wrote:

 Hi Itamar,

 there's definitely certain limit of machines which can Mesos master
 handle. This limit is between 10 000 - 20 000 (that's number
 reported by Twitter). This bottleneck is caused by event loop which
 handles communication at master.

 With hundreds of machines you should be fine. Only in case that your
 framework scheduler would demand
 too many resources for computing allocations you might encounter some
 problems.

 How does the strength of the master  scheduler machines affect the
 overall cluster performance?


 I would say that the network is usually the main bottleneck. Adding extra
 RAM won't improve mesos-master
 performance. Of course if there's high CPU load on master you might
 observe performance regression. Also
 this depends on granularity of your tasks, if you have few long running
 tasks or many short tasks (which runs
 just hundreds of ms).

 Tomas


 On 6 January 2015 at 10:12, Itamar Ostricher ita...@yowza3d.com wrote:

 Are there recommendations regarding master / scheduler machines
 resources as function of cluster size?

 Say I have a cluster with hundreds of slave machines and thousands of
 CPUs, with a single framework that will schedule millions of tasks.
 How does the strength of the master  scheduler machines affect the
 overall cluster performance?

 Thanks,
 - Itamar.






Re: conf files location of mesos.

2015-01-07 Thread Tomas Barton
Hi Dan,

this depends on your distribution. Mesosphere package comes with wrapper
script which uses configuration
placed in /etc/default/mesos and /etc/mesos-master, /etc/mesos-slave

https://github.com/mesosphere/mesos-deb-packaging/blob/master/mesos-init-wrapper

which distribution do you use?

Tomas

On 7 January 2015 at 16:23, Dan Dong dongda...@gmail.com wrote:

 Hi,
   After installation of mesos on my cluster, where could I find the
 location of configuration files?
 E.g: mesos.conf, masters, slaves etc. I could not find any of them under
 the prefix dir and subdirs (configure
 --prefix=/home/dan/mesos-0.21.0/build/). Are there examples for the conf
 files? Thanks!

 Cheers,
 Dan




Re: conf files location of mesos.

2015-01-07 Thread Tomas Barton
Yeah, that's correct. make install won't create /etc/default/mesos. When
you compile
Mesos from source you have to write your init scripts and configuration
files
by yourself.

Mesos recognizes env variables like MESOS_ZK, etc., have a look at:

http://mesos.apache.org/documentation/latest/configuration/

or see:

mesos-slave --help
mesos-master --help





On 7 January 2015 at 17:53, Dan Dong dongda...@gmail.com wrote:

 Hi, Brian,
   It's not there:
 ls /etc/default/mesos
 ls: cannot access /etc/default/mesos: No such file or directory

 I installed mesos from source tar ball by configure;make;make install as
 normal user.

 Cheers,
 Dan


 2015-01-07 10:43 GMT-06:00 Brian Devins brian.dev...@dealer.com:

  Try ls /etc/default/mesos instead

   From: Dan Dong dongda...@gmail.com
 Reply-To: user@mesos.apache.org user@mesos.apache.org
 Date: Wednesday, January 7, 2015 at 11:38 AM
 To: user@mesos.apache.org user@mesos.apache.org
 Subject: Re: conf files location of mesos.

Hi, All,
Thanks for your helps, I'm using version 0.21.0 of mesos. But I do not
 see any of the dirs of 'etc' or 'var' under my build directory(and any
 subdirs). What is the default conf files location for mesos 0.21.0?

 ls ~/mesos-0.21.0/build/
 3rdparty  bin  config.log  config.lt  config.status  ec2  include  lib
 libexec  libtool  Makefile  mesos.pc  mpi  sbin  share  src

Cheers,
Dan

 2015-01-07 9:47 GMT-06:00 Tomas Barton barton.to...@gmail.com:

 Hi Dan,

  this depends on your distribution. Mesosphere package comes with
 wrapper script which uses configuration
 placed in /etc/default/mesos and /etc/mesos-master, /etc/mesos-slave


 https://github.com/mesosphere/mesos-deb-packaging/blob/master/mesos-init-wrapper

  which distribution do you use?

  Tomas

 On 7 January 2015 at 16:23, Dan Dong dongda...@gmail.com wrote:

   Hi,
After installation of mesos on my cluster, where could I find the
 location of configuration files?
  E.g: mesos.conf, masters, slaves etc. I could not find any of them
 under the prefix dir and subdirs (configure
 --prefix=/home/dan/mesos-0.21.0/build/). Are there examples for the conf
 files? Thanks!

  Cheers,
  Dan





 Brian Devins* |* Java Developer
 brian.dev...@dealer.com

 [image: Dealer.com]





Re: Failure to build (possibly a 3rd party issue)

2014-12-27 Thread Tomas Barton
Hi,

Mesos = 0.21 requires GCC = 4.8 for building

https://issues.apache.org/jira/browse/MESOS-1044

Tomas

On 18 December 2014 at 07:38, Ritwik ritwik.ya...@gmail.com wrote:

 Thanks Michael for your reply.

 *I was able to resolve the issue by upgrading from GCC 4.6 to GCC 4.8*

 It seems to work perfectly now, however, not knowing what caused the
 problem might prove to be bad in the long run.

 Thanks again for all the help.

 On 18 December 2014 at 01:40, Michael Park mcyp...@gmail.com wrote:

 Hi Ritwik,

 I've seen this problem before but haven't had time to look into it.
 We made C++11 a requirement so the Makefile actually isn't the issue here.

 I can look into it though,

 Thanks for bringing it up!

 MPark

 On 17 December 2014 at 07:44, Ritwik ritwik.ya...@gmail.com wrote:
 
  I dug around the error message a bit. I see the compilation being done
  using:
  -std=c++0x
  The error message seems to be related to creating rvalue references
 which
  is a new feature of C++11. Does the make file need an update?
 
  On 17 December 2014 at 15:42, Ritwik ritwik.ya...@gmail.com wrote:
  
   Hi,
  
   I was trying to build Mesos using 'make'. Here is what I got:
  
   http://code.stypi.com/pw3146dx
  
   Is this a known issue? It would be great if someone could suggest a
   workaround.
  
   Thanks.
  
   Best Regards,
  
   --
   *Ritwik Yadav*
  
  
 
  --
  *Ritwik Yadav*
 
  Department of Computer Science and Engineering,
  Indian Institute of Technology,
  Kharagpur.
 
  Cell: +91-9635-152346
  Twitter: @iRitwik https://twitter.com/#%21/iRitwik
 



 --
 *Ritwik Yadav*

 Department of Computer Science and Engineering,
 Indian Institute of Technology,
 Kharagpur.

 Cell: +91-9635-152346
 Twitter: @iRitwik https://twitter.com/#%21/iRitwik



Re: How to launch Storm topology on Apache Marathon

2014-12-09 Thread Tomas Barton
Marathon is a meta-framework, it makes sense to run Storm on Marathon when
you need to have Storm in a HA mode.

Storm comes up with a command line client:

http://storm.apache.org/documentation/Command-line-client.html

For submitting topology you can use `storm jar ...` command.

There's a configuration file storm.yaml where you can setup various
settings, anyway the steps should be:

 1) setup ZooKeeper quorum (for testing 1 instance should be enough
 2) distribute storm binary on a cluster
 3) update storm.yaml accordingly
 4) start nimbus (resp. storm-mesos framework - could be done via Marathon,
for testing purposes starting from console is also fine)
 5) submit topology jar

Nimbus address should be fetched from ZooKeeper, or you could hardcode it
into storm.yaml.


On 9 December 2014 at 19:08, Obaid Salikeen obaid.salik...@ask.com wrote:

  Thanks a lot Tomas,



 I was actually trying out Apache Marathon. I am trying to run storm-mesos
 framework over Marathon.



 So far I managed to run Nimbus and UI by running following two commands
 through Marathon UI, however

 -  UI dosent know where to find Nimbus.

 -  Secondly I don’t know how to deploy my Storm topology jar on
 running instance of Nimbus on Apache Marathon:



 *Run Numbus on Marathon:*

 Command: ./storm-mesos-0.9/bin/storm-mesos nimbus

 URI: http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz



 *Run storm UI on Marathon:*

 Command: ./storm-mesos-0.9/bin/storm ui

 URI: http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz



 It would be great if you could let me know how to deploy Storm topology
 over Marathon. Thanks a lot,



 Obaid

 PS







 *From:* Tomas Barton [mailto:barton.to...@gmail.com]
 *Sent:* Tuesday, December 09, 2014 2:15 AM
 *To:* user
 *Subject:* Re: How to launch Storm topology on Apache Marathon



 Hi Obaid,



 you'll need one instance of Nimbus (storm coordinator) which could be
 running as Mesos framework, have a look here:



 https://github.com/deric/storm-mesos



 Nimbus could be started via marathon, just use something like:



 /usr/local/bin/storm-mesos nimbus



 Then on Mesos slaves will be launched tasks requested by Nimbus, you don't
 have to start Supervisors on each slave. You could use either some binary

 package for distributing Storm jars or it could be copied before launching
 tasks via Mesos.



 If you are using Storm DRPC you would need to start DRPC daemon on each
 node. Also for accessing logs from Storm UI there is a log service that

 should be running on each node if you would like to use this feature.



 Regards,

 Tomas





 On 9 December 2014 at 03:44, Obaid Salikeen obaid.salik...@ask.com
 wrote:

  Hi,

 Currently, I couldn’t find any tutorial or steps regarding how to launch
 storm-mesos (Storm topology) framework on Apache Marathon. It would be
 great if you guys could give me any reference or hints over how to launch
 Storm topology over Marathon.

 -  Do I need to first install Storm on every single machine on my
 Mesos cluster?

 -  What is the recommended way to launch Storm topology over
 Marathon?



 Thanks a lot,

 Obaid







Re: How to launch Storm topology on Apache Marathon

2014-12-09 Thread Tomas Barton
Nimbus IP address should be stored in ZooKeeper as soon as Nimbus is
launched by Marathon. Marathon could be used for launching Nimbus (log and
ui services). However launching storm topologies should be negotiated by
Storm-Mesos(Nimbus) framework (not Marathon).

On 9 December 2014 at 22:04, Obaid Salikeen obaid.salik...@ask.com wrote:

  Thanks, I have previously deployed a few topologies on Storm cluster and
 Apache Mesos cluster (using Storm-Mesos project), however I want to know
 explicitly that if I launched Nimbus through Apache Marathon then how do I
 launch my topology Jar file on Nimbus (since there is no local machine
 installation of Nimbus, it was directly launched through Marathon and
 Marathon decides which machine it should run Nimbus on)?  I have currently
 Mesos cluster with 10 nodes, and Marathon on top. Previously I deployed my
 Storm topology on this cluster using Storm-Mesos framework however now I
 want to use Marathon to launch nimbus and my tasks .



 Thanks

 Obaid







 *From:* Tomas Barton [mailto:barton.to...@gmail.com]
 *Sent:* Tuesday, December 09, 2014 11:17 AM
 *To:* Salikeen, Obaid
 *Cc:* user@mesos.apache.org

 *Subject:* Re: How to launch Storm topology on Apache Marathon



 Marathon is a meta-framework, it makes sense to run Storm on Marathon when
 you need to have Storm in a HA mode.



 Storm comes up with a command line client:



 http://storm.apache.org/documentation/Command-line-client.html



 For submitting topology you can use `storm jar ...` command.



 There's a configuration file storm.yaml where you can setup various
 settings, anyway the steps should be:



  1) setup ZooKeeper quorum (for testing 1 instance should be enough

  2) distribute storm binary on a cluster

  3) update storm.yaml accordingly

  4) start nimbus (resp. storm-mesos framework - could be done via
 Marathon, for testing purposes starting from console is also fine)

  5) submit topology jar



 Nimbus address should be fetched from ZooKeeper, or you could hardcode it
 into storm.yaml.





 On 9 December 2014 at 19:08, Obaid Salikeen obaid.salik...@ask.com
 wrote:

  Thanks a lot Tomas,



 I was actually trying out Apache Marathon. I am trying to run storm-mesos
 framework over Marathon.



 So far I managed to run Nimbus and UI by running following two commands
 through Marathon UI, however

 -  UI dosent know where to find Nimbus.

 -  Secondly I don’t know how to deploy my Storm topology jar on
 running instance of Nimbus on Apache Marathon:



 *Run Numbus on Marathon:*

 Command: ./storm-mesos-0.9/bin/storm-mesos nimbus

 URI: http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz



 *Run storm UI on Marathon:*

 Command: ./storm-mesos-0.9/bin/storm ui

 URI: http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz



 It would be great if you could let me know how to deploy Storm topology
 over Marathon. Thanks a lot,



 Obaid

 PS







 *From:* Tomas Barton [mailto:barton.to...@gmail.com]
 *Sent:* Tuesday, December 09, 2014 2:15 AM
 *To:* user
 *Subject:* Re: How to launch Storm topology on Apache Marathon



 Hi Obaid,



 you'll need one instance of Nimbus (storm coordinator) which could be
 running as Mesos framework, have a look here:



 https://github.com/deric/storm-mesos



 Nimbus could be started via marathon, just use something like:



 /usr/local/bin/storm-mesos nimbus



 Then on Mesos slaves will be launched tasks requested by Nimbus, you don't
 have to start Supervisors on each slave. You could use either some binary

 package for distributing Storm jars or it could be copied before launching
 tasks via Mesos.



 If you are using Storm DRPC you would need to start DRPC daemon on each
 node. Also for accessing logs from Storm UI there is a log service that

 should be running on each node if you would like to use this feature.



 Regards,

 Tomas





 On 9 December 2014 at 03:44, Obaid Salikeen obaid.salik...@ask.com
 wrote:

  Hi,

 Currently, I couldn’t find any tutorial or steps regarding how to launch
 storm-mesos (Storm topology) framework on Apache Marathon. It would be
 great if you guys could give me any reference or hints over how to launch
 Storm topology over Marathon.

 -  Do I need to first install Storm on every single machine on my
 Mesos cluster?

 -  What is the recommended way to launch Storm topology over
 Marathon?



 Thanks a lot,

 Obaid









Re: Problems with OOM

2014-09-26 Thread Tomas Barton
Just to make sure, all slaves are running with:

--isolation='cgroups/cpu,cgroups/mem'

Is there something suspicious in mesos slave logs?

On 26 September 2014 13:20, Stephan Erb stephan@blue-yonder.com wrote:

  Hi everyone,

 I am having issues with the cgroups isolation of Mesos. It seems like
 tasks are prevented from allocating more memory than their limit. However,
 they are never killed.

- My scheduled task allocates memory in a tight loop. According to
'ps', once its memory requirements are exceeded it is not killed, but ends
up in the state D (uninterruptible sleep (usually IO)).
- The task is still considered running by Mesos.
- There is no indication of an OOM in dmesg.
- There is neither an OOM notice nor any other output related to the
task in the slave log.
- According to htop, the system load is increased with a significant
portion of CPU time spend within the kernel. Commonly the load is so high
that all zookeeper connections time out.

 I am running Aurora and Mesos 0.20.1 using the cgroups isolation on Debian
 7 (kernel 3.2.60-1+deb7u3). .

 Sorry for the somewhat unspecific error description. Still, anyone an idea
 what might be wrong here?

 Thanks and Best Regards,
 Stephan



Re: Mesos on Gentoo

2014-09-08 Thread Tomas Barton
Hi James,

Spark has support for HDFS, however you don't have to use it and there's no
need to install whole Hadoop stack. I've tested Mesos and Spark with FhGFS
distributed filesystem and it works just fine.

Tomas

On 8 September 2014 06:39, Vinod Kone vinodk...@gmail.com wrote:

 Hi James,

 Great to see a Gentoo package for Mesos!

 Regarding HDFS requirement, any shared storage (even just a http/ftp
 server works) that the Mesos slaves can pull the executor from is enough.



Re: OOM not always detected by Mesos Slave

2014-09-05 Thread Tomas Barton
There is some overhead for the JVM itself, which should be added to the
total usage of memory for the task. So you can't have the same amount of
memory for the task as you pass to java, -Xmx parameter.


On 2 September 2014 20:43, Benjamin Mahler benjamin.mah...@gmail.com
wrote:

 Looks like you're using the JVM, can you set all of your JVM flags to
 limit the memory consumption? This would favor an OutOfMemoryError instead
 of OOMing the cgroup.


 On Thu, Aug 28, 2014 at 5:51 AM, Whitney Sorenson wsoren...@hubspot.com
 wrote:

 Recently, I've seen at least one case where a process inside of a task
 inside of a cgroup exceeded memory limits and the process was killed
 directly. The executor recognized the process was killed and sent a
 TASK_FAILED. However, it seems far more common to see the executor process
 itself destroyed and the mesos slave (I'm making some assumptions here
 about how it all works) sends a TASK_FAILED which includes information
 about the memory usage.

 Is there something we can do to make this behavior more consistent?

 Alternatively, can we provide some functionality to hook into so we don't
 need to duplicate the work of the mesos slave in order to provide the same
 information in the TASK_FAILED message? I think users would like to know
 definitively that the task OOM'd, whereas in the case where the underlying
 task is killed it may take a lot of digging to find the underlying cause if
 you aren't looking for it.

 -Whitney

 Here are relevant lines from messages in case something else is amiss:

 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067321] Task in
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e killed as a result of limit of
 /mesos/2dda5398-6aa6-49bb-8904-37548eae837e
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.067334] memory: usage
 917420kB, limit 917504kB, failcnt 106672
 Aug 27 23:24:07 ip-10-237-165-119 kernel: [2604343.066947] java7 invoked
 oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0







Re: Mesos + storm on top of Docker

2014-08-20 Thread Tomas Barton
The later is definitely a better choice. Yet another fork of storm-mesos is
here:

https://github.com/deric/storm-mesos


On 19 August 2014 20:22, Yaron Rosenbaum yaron.rosenb...@gmail.com wrote:

 I'm not getting it from git, but rather downloading it from:
 http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz

 And it looks a bit dated.
 Looking at git, there are two forks that seem more or less 'official':
 https://github.com/mesos/storm
 https://github.com/mesosphere/storm-mesos

 The first hasn't been updated in a while.


 (Y)

 On Aug 19, 2014, at 5:54 PM, Brenden Matthews 
 brenden.matth...@airbedandbreakfast.com wrote:

 What version of the storm on mesos code are you running?  i.e., what is
 the git sha?

 On Mon, Aug 18, 2014 at 11:53 PM, Yaron Rosenbaum 
 yaron.rosenb...@gmail.com wrote:

 Ok, thanks for the tip!
 Made some progress. Now this is what I get :
 stderr on the slave:
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0818 19:06:55.03369922 fetcher.cpp:73] Fetching URI '
 http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz'
 I0818 19:06:55.03399422 fetcher.cpp:123] Downloading '
 http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz' to
 '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226/storm-mesos-0.9.tgz'
 I0818 19:07:11.56751422 fetcher.cpp:61] Extracted resource
 '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226/storm-mesos-0.9.tgz'
 into
 '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226'
 --2014-08-18 19:07:12--  http://master:35468/conf/storm.yaml
 Resolving master (master)... 172.17.0.147
 Connecting to master (master)|172.17.0.147|:35468... connected.
 HTTP request sent, awaiting response... 404 Not Found
 2014-08-18 19:07:12 ERROR 404: Not Found.

 root@master:/# cat /var/log/supervisor/mesos-master-stderr.log
 ...
 I0818 19:11:10.45627419 master.cpp:2704] Executor
 wordcount-1-1408388814 of framework 20140818-190538-2466255276-5050-11-0002
 on slave 20140818-190538-2466255276-5050-11-0 at slave(1)@
 172.17.0.149:5051 (slave) has exited with status 8
 I0818 19:11:10.45782419 master.cpp:2628] Status update TASK_LOST
 (UUID: ddd2a5c6-39d6-4450-824b-2ddc5b39869b) for task slave-31000 of
 framework 20140818-190538-2466255276-5050-11-0002 from slave
 20140818-190538-2466255276-5050-11-0 at slave(1)@172.17.0.149:5051
 (slave)
 I0818 19:11:10.45789819 master.hpp:673] Removing task slave-31000
 with resources cpus(*):1; mem(*):1000; ports(*):[31000-31000] on slave
 20140818-190538-2466255276-5050

 root@master:/# cat /var/log/supervisor/nimbus-stderr.log
 I0818 19:06:23.683955   190 sched.cpp:126] Version: 0.19.1
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@712: Client
 environment:zookeeper.version=zookeeper C client 3.4.5
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@716: Client
 environment:host.name=master
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@723: Client
 environment:os.name=Linux
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@724: Client
 environment:os.arch=3.15.3-tinycore64
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@725: Client
 environment:os.version=#1 SMP Fri Aug 15 09:11:44 UTC 2014
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@733: Client
 environment:user.name=(null)
 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@741: Client
 environment:user.home=/root
 2014-08-18 19:06:23,685:26(0x7f3575014700):ZOO_INFO@log_env@753: Client
 environment:user.dir=/
  2014-08-18 19:06:23,685:26(0x7f3575014700):ZOO_INFO@zookeeper_init@786:
 Initiating client connection, host=zookeeper:2181 sessionTimeout=1
 watcher=0x7f3576f9cf80 sessionId=0 sessionPasswd=null
 context=0x7f3554000e00 flags=0
 2014-08-18 19:06:23,712:26(0x7f3573010700):ZOO_INFO@check_events@1703:
 initiated connection to server [172.17.0.145:2181]
 2014-08-18 19:06:23,724:26(0x7f3573010700):ZOO_INFO@check_events@1750:
 session establishment complete on server [172.17.0.145:2181],
 sessionId=0x147ea82a658000c, negotiated timeout=1
 I0818 19:06:23.729141   242 group.cpp:310] Group process ((3)@
 172.17.0.147:49673) connected to ZooKeeper
 I0818 19:06:23.729308   242 group.cpp:784] Syncing group operations:
 queue size (joins, cancels, datas) = (0, 0, 0)
 I0818 19:06:23.729367   242 group.cpp:382] Trying to create path '/mesos'
 in ZooKeeper
  I0818 19:06:23.745023   242 detector.cpp:135] Detected a new leader:
 (id='1')
 I0818 19:06:23.745312   242 group.cpp:655] Trying to get
 '/mesos/info_01' in ZooKeeper
 I0818 19:06:23.752063   242 

Re: mesos build errors

2014-07-23 Thread Tomas Barton
Hi,

that's quite strange. Try to run

ldconfig

and then again make.

You can find binary packages for Debian here:
http://mesosphere.io/downloads/

Tomas

On 23 July 2014 10:09, Itamar Ostricher ita...@yowza3d.com wrote:

 Hi,

 I'm trying to do a clean build of mesos for the 0.19.0 tarball.
 I was following the instructions from
 http://mesos.apache.org/gettingstarted/ step by step. Got to running
 `make`, which ran for quite a while, and exited with errors (see the end of
 the output below).

 Extra env info: I'm trying to do this build on a 64-bit Debian GCE
 instance:
 itamar@mesos-test-1:/tmp/mesos-0.19.0/build$ uname -a
 Linux mesos-test-1 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

 Assistance will be much appreciated!
 Alternatively, I don't mind using precompiled binaries, if anyone can
 point me in the direction of such binaries for the GCE environment I
 described :-)

 tail of make output:
 

 libtool: link: warning: `/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/
 libgflags.la' seems to be moved
 *** Warning: Linking the shared library libmesos.la against the
 *** static library ../3rdparty/leveldb/libleveldb.a is not portable!
 libtool: link: warning: `/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/
 libgflags.la' seems to be moved
 libtool: link: g++  -fPIC -DPIC -shared -nostdlib
 /usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crti.o
 /usr/lib/gcc/x86_64-linux-gnu/4.7/crtbeginS.o  -Wl,--whole-archive
 ./.libs/libmesos_no_3rdparty.a ../3rdparty/libprocess/.libs/libprocess.a
 ./.libs/libjava.a -Wl,--no-whole-archive
  ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/.libs/libprotobuf.a
 ../3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a
 -L/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib
 ../3rdparty/leveldb/libleveldb.a
 ../3rdparty/zookeeper-3.4.5/src/c/.libs/libzookeeper_mt.a
 /tmp/mesos-0.19.0/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a
 /usr/lib/libgflags.so -lpthread
 /tmp/mesos-0.19.0/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a
 -lsasl2 /usr/lib/x86_64-linux-gnu/libcurl-nss.so -lz -lrt
 -L/usr/lib/gcc/x86_64-linux-gnu/4.7
 -L/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu
 -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu
 -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.7/../../.. -lstdc++ -lm
 -lc -lgcc_s /usr/lib/gcc/x86_64-linux-gnu/4.7/crtendS.o
 /usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crtn.o
  -pthread -Wl,-soname -Wl,libmesos-0.19.0.so -o .libs/libmesos-0.19.0.so
 libtool: link: (cd .libs  rm -f libmesos.so  ln -s 
 libmesos-0.19.0.so libmesos.so)
 libtool: link: ( cd .libs  rm -f libmesos.la  ln -s ../
 libmesos.la libmesos.la )
 g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\
 -DPACKAGE_VERSION=\0.19.0\ -DPACKAGE_STRING=\mesos\ 0.19.0\
 -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\
 -DVERSION=\0.19.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1
 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD=1 -DMESOS_HAS_JAVA=1
 -DHAVE_PYTHON=\2.7\ -DMESOS_HAS_PYTHON=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1
 -DHAVE_LIBSASL2=1 -I. -I../../src   -Wall -Werror
 -DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\
 -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include
 -I../../3rdparty/libprocess/include
 -I../../3rdparty/libprocess/3rdparty/stout/include -I../include
 -I../3rdparty/libprocess/3rdparty/boost-1.53.0
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src
 -I../3rdparty/libprocess/3rdparty/picojson-4f93734
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
 -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include
 -I../3rdparty/zookeeper-3.4.5/src/c/generated   -pthread -g -g2 -O2 -MT
 local/mesos_local-main.o -MD -MP -MF local/.deps/mesos_local-main.Tpo -c -o
 local/mesos_local-main.o `test -f 'local/main.cpp' || echo
 '../../src/'`local/main.cpp
 mv -f local/.deps/mesos_local-main.Tpo local/.deps/mesos_local-main.Po
 /bin/bash ../libtool  --tag=CXX   --mode=link g++ -pthread -g -g2 -O2   -o
 mesos-local local/mesos_local-main.o libmesos.la -lsasl2 -lcurl -lz  -lrt
 libtool: link: g++ -pthread -g -g2 -O2 -o .libs/mesos-local
 local/mesos_local-main.o  ./.libs/libmesos.so /usr/lib/libgflags.so
 -lpthread -lsasl2 /usr/lib/x86_64-linux-gnu/libcurl-nss.so -lz -lrt -pthread
 ./.libs/libmesos.so: error: undefined reference to 'dlopen'
 ./.libs/libmesos.so: error: undefined reference to 'dlsym'
 ./.libs/libmesos.so: error: undefined reference to 'dlerror'
 collect2: error: ld returned 1 exit status
 make[2]: *** [mesos-local] Error 1
 make[2]: Leaving directory `/tmp/mesos-0.19.0/build/src'
 make[1]: *** [all] Error 2
 make[1]: Leaving directory `/tmp/mesos-0.19.0/build/src'
 make: *** [all-recursive] Error 1



Re: Mesos 0.19 registrar upgrade

2014-07-23 Thread Tomas Barton
Ok, thanks Ben! In would be nice to update documentation accordingly.

So, in 0.20 there might be a flag specifying total number of masters?


On 23 July 2014 00:13, Benjamin Mahler benjamin.mah...@gmail.com wrote:

 At the current time, you need an odd number of masters as there is an
 assumption built into the replicated that the number of masters = 2*quorum
 - 1. This assumption is present when bootstrapping the log from no data.

 To recover from this, you need to run an odd number of masters, and set
 your quorum correctly. For example, 3 masters with quorum 2, or 5 masters
 with quorum 3. It is safe to wipe the replica logs before doing this.

 There are some outstanding tickets to clean this up:
 https://issues.apache.org/jira/browse/MESOS-1465
 https://issues.apache.org/jira/browse/MESOS-1546

 We'd like to have the configuration be explicit about the total number of
 masters, so that the assumption need not be made.


 On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton barton.to...@gmail.com
 wrote:

 Hi,

 what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've
 tried to read all documentation before doing actual upgrade, but I still
 don't understand a few things.

 What should be the quorum size?

 The --help says that It is imperative to set this value to be a majority
 of masters i.e., quorum  (number of masters)/2

 I have 4 Mesos masters, which would mean that quorum  2 - quorum=3,
 right?

 The recover.cpp says that: we allow a replica in EMPTY status to become
 VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in
 EMPTY status
 So, with quorum = 3 I would need 5 Mesos masters (that's just not clear
 from the mesos-master --help).

 quorum=1, mesos-masters=1
 quorum=2, mesos-masters=3
 quorum=3, mesos-masters=5
 quorum=4, mesos-masters=7

 Is is possible to have non-even number of Mesos masters? or is it just a
 bad idea?

 With 4 masters I got into a situation when:

 master 1:
 I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status
 received a broadcasted recover request

 master 2:
 I0722 11:36:37.593647  7754 replica.cpp:638] Replica in EMPTY status
 received a broadcasted recover request

 master 3:
 I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response
 from a replica in STARTING status

 master 4:
 I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status
 received a broadcasted recover request
 I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response
 from a replica in STARTING status
 I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response
 from a replica in VOTING status
 I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response
 from a replica in EMPTY status

 And the election algorithm ends up in an endless loop. How can I recover
 from this? Delete all replica logs from master disk? Start with quorum=1
 and increment number of masters?

 Thanks,
 Tomas





Mesos 0.19 registrar upgrade

2014-07-22 Thread Tomas Barton
Hi,

what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've tried
to read all documentation before doing actual upgrade, but I still don't
understand a few things.

What should be the quorum size?

The --help says that It is imperative to set this value to be a majority
of masters i.e., quorum  (number of masters)/2

I have 4 Mesos masters, which would mean that quorum  2 - quorum=3, right?

The recover.cpp says that: we allow a replica in EMPTY status to become
VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in
EMPTY status
So, with quorum = 3 I would need 5 Mesos masters (that's just not clear
from the mesos-master --help).

quorum=1, mesos-masters=1
quorum=2, mesos-masters=3
quorum=3, mesos-masters=5
quorum=4, mesos-masters=7

Is is possible to have non-even number of Mesos masters? or is it just a
bad idea?

With 4 masters I got into a situation when:

master 1:
I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request

master 2:
I0722 11:36:37.593647  7754 replica.cpp:638] Replica in EMPTY status
received a broadcasted recover request

master 3:
I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response
from a replica in STARTING status

master 4:
I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status
received a broadcasted recover request
I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response
from a replica in STARTING status
I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response
from a replica in VOTING status
I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response
from a replica in EMPTY status

And the election algorithm ends up in an endless loop. How can I recover
from this? Delete all replica logs from master disk? Start with quorum=1
and increment number of masters?

Thanks,
Tomas


Re: How can libmesos bind and declare specific network interface

2014-07-01 Thread Tomas Barton
Hi,

have you tried setting '--ip 10.69.69.45' ?

So, mesos-master is binded to a wrong interface? Or you have problem with
mesos-slaves?

Tomas


On 1 July 2014 12:16, Damien Hardy dha...@viadeoteam.com wrote:

 Hello,

 We would like to use spark on mesos but mesos cluster is accessible via
 VPN.
 When running spark-shell we can see registrations attemps rununing with
 defaut public interface of the desktop :

 ```
 I0701 12:07:34.710917 2440 master.cpp:820] Framework
 20140612-135938-16790026-5050-2407-0537
 (scheduler(1)@192.168.2.92:42731) already registered, resending
 acknowledgement
 I0701 12:07:35.711632  2430 master.cpp:815] Received registration
 request from scheduler(1)@192.168.2.92:42731
 ```

 But we would like it register with the VPN interface.

 This is working when changing my /etc/hosts file and setting hostname on
 my VPN address:
 ```
 I0701 12:03:54.193022  2441 master.cpp:815] Received registration
 request from scheduler(1)@10.69.69.45:47440
 I0701 12:03:54.193094  2441 master.cpp:833] Registering framework
 20140612-135938-16790026-5050-2407-0536 at scheduler(1)@10.69.69.45:47440
 ```

 I tried spark with
 ```
 spark.driver.host   10.69.69.45
 ```
 I can see spark binding to the rigth interfa ce but mesos keep
 registring with default one. (and fail)

 I hope envvar $MESOS_hostname would do the trick but without success...

 Thank for help.

 --
 Damien HARDY
 IT Infrastructure Architect
 Viadeo - 30 rue de la Victoire - 75009 Paris - France
 PGP : 45D7F89A




executor initialization

2014-06-01 Thread Tomas Barton
Hi,

I have a problem with Mesos executor initialization. A slave receives a
task and is launching an executor, stderr:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0601 14:54:36.055747 15807 exec.cpp:131] Version: 0.18.2

stdout:

3399 [main] INFO  storm.mesos.MesosSupervisor - Waiting for executor to
initialize...
(just this, nothing more)

it gets stuck at MesosExecutorDriver initialization:

framework executor:

Semaphore initter = new Semaphore(0);
_executor = new StormExecutor(initter);
_driver = new MesosExecutorDriver(_executor);
_driver.start();
LOG.info(Waiting for executor to initialize...);
try {
initter.acquire();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
LOG.info(Executor initialized...);

it won't get over the semaphore, which means that it's waiting for
ExecutorProcess initialization

https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L694

tcp0  0 0.0.0.0:50510.0.0.0:*   LISTEN

tcp0  0 0.0.0.0:56363   0.0.0.0:*   LISTEN

tcp0  0 10.0.0.32:5051  mesos-master:51742
ESTABLISHED
tcp0  1 10.0.0.32:54868 mesos-master:5051  SYN_SENT

tcp0  0 10.0.0.32:36554 zookeeper:2181  ESTABLISHED
tcp0  0 10.0.0.32:42186 mesos-master:5050
 ESTABLISHED

If I understand it correctly, mesos executor has opened port 56363 and is
trying to establish connection with master, right?

However there's no error message about timeout or failed connection. Just
the state of task is TASK_LOST.

Any idea what's wrong?

Thanks,
Tomas


Re: Log managment

2014-05-30 Thread Tomas Barton
I've already refactored the logging, I'm redirecting stdout and stderr to a
master.log or slave.log

https://github.com/deric/mesos-deb-packaging/blob/master/mesos-init-wrapper#L109

the logrotate itself it quite simple

/var/log/mesos/*.log {
daily

missingok
rotate 30
compress

delaycompress
notifempty
}



On 30 May 2014 11:14, Damien Hardy dha...@viadeoteam.com wrote:

 Hello,

 Yes I do.
 I thought this was re right thing to do for logs.
 But never ending file is not safe usable. This option --log_dir need
 some rework I suppose.
 I will go with stdout/stderr pipeline instead (using logrotate
 copytruncate to handle open file descriptors)

 Thank you

 Le 15/05/2014 22:02, Tomas Barton a écrit :
  Hi Damien,
 
  do you use the `--log_dir` switch? If so, mesos is creating quite many
  files in a strange format:
 
  mesos-slave.{hostname}.invalid-user.log.INFO.20140409-155625.7545
 
  when you forward stdout of the service to a single file and afterwards
  apply simple logroate
  rules, you might get nicer logs.
 
  Tomas

 --
 Damien HARDY




Re: How to kill stuck frameworks in mesos

2014-05-28 Thread Tomas Barton
Hi,

I have similar issue, Mesos is trying to keep alive a framework that is
crashing:

I0528 16:42:52.487659  6009 master.cpp:929] Framework
20140528-054038-316558480-5050-17117-0003 failed over
I0528 16:42:52.487927  6009 hierarchical_allocator_process.hpp:378]
Activated framework 20140528-054038-316558480-5050-17117-0003
I0528 16:42:52.488483  6009 master.cpp:2282] Sending 2 offers to framework
20140528-054038-316558480-5050-17117-0003
I0528 16:42:52.488873  6009 master.cpp:592] Framework
20140528-054038-316558480-5050-17117-0003 disconnected
I0528 16:42:52.488914  6009 master.cpp:1076] Deactivating framework
20140528-054038-316558480-5050-17117-0003
I0528 16:42:52.489202  6009 master.cpp:614] Giving framework
20140528-054038-316558480-5050-17117-0003 1weeks to failover
I0528 16:42:52.489279  6009 hierarchical_allocator_process.hpp:408]
Deactivated framework 20140528-054038-316558480-5050-17117-0003

it's trying to recover the framework few times per second. Is there
currently a way how to remove that framework?

Probably delete framework state from zookeeper?

Tomas


On 28 May 2014 05:56, Manivannan citizenm...@gmail.com wrote:

 Hi Vinod,

 Thanks for your reply. Please see inline.

 Thanks,
 Mani


 On Wed, May 28, 2014 at 3:57 AM, Vinod Kone vinodk...@gmail.com wrote:

 Hi Mani,

 What do you mean by stuck framework? If the framework disconnects from
 master and the failover timeout (configurable) has passed master should
 remove the framework. - *I have a Mesos cluster and lot of Jenkins
 instances talking to the cluster to provision slaves. Although I have
 killed the Jenkins instanes, I still see that they are listed as frameworks
 in Mesos(that is what I mentioned as stuck frameworks). What is the
 default fail over timeout ? *




 Also, there is currently work in progress to give operators the ability
 to force remove a framework. See :
 https://issues.apache.org/jira/browse/MESOS-1390 - *I believe this fix
 would help me out.*





 On Tue, May 27, 2014 at 5:01 AM, Manivannan citizenm...@gmail.comwrote:

 Hi ,

 My issue is similar to : https://issues.apache.org/jira/browse/MESOS-108

 Couple of  frameworks were stuck forever in my Mesos cluster. Is there a
 way to kill those frameworks ?

 Thanks,
 Mani






Re: Mesos master behind NAT

2014-05-23 Thread Tomas Barton
ok, I was using zookeer URL with zk01.example.com when I replaced it by an
IP address it started to work. Thanks


On 23 May 2014 17:58, Vinod Kone vinodk...@gmail.com wrote:

 That error indicates that master is unable to resolve the ZK server
 address. What is your full command line?


 On Fri, May 23, 2014 at 8:51 AM, Tomas Barton barton.to...@gmail.comwrote:

 I've set up the --hostname mesos-m1.example.com but it crashes the master

 I0523 17:46:22.166190 18203 contender.cpp:127] Joining the ZK group
 I0523 17:46:30.253983 18197 http.cpp:391] HTTP request for
 '/master/state.json'
 I0523 17:46:40.302247 18196 http.cpp:391] HTTP request for
 '/master/state.json'
 2014-05-23 17:46:42,169:18193(0x7fdb584b2700):ZOO_ERROR@getaddrs@599:
 getaddrinfo: Invalid argument

 F0523 17:46:42.169214 18201 zookeeper.cpp:74] Failed to create ZooKeeper,
 zookeeper_init: Invalid argument [22]
 *** Check failure stack trace: ***
 2014-05-23 17:46:42,169:18193(0x7fdb57cb1700):ZOO_ERROR@getaddrs@599:
 getaddrinfo: Invalid argument

 F0523 17:46:42.169800 18202 zookeeper.cpp:74] Failed to create ZooKeeper,
 zookeeper_init: Invalid argument [22]
 *** Check failure stack trace: ***
 @ 0x7fdb6115ea5d  google::LogMessage::Fail()
 @ 0x7fdb61160813  google::LogMessage::SendToLog()
 @ 0x7fdb6115e678  google::LogMessage::Flush()
 @ 0x7fdb6115ea5d  google::LogMessage::Fail()
 @ 0x7fdb61160813  google::LogMessage::SendToLog()
 @ 0x7fdb6115e678  google::LogMessage::Flush()
 @ 0x7fdb6115e859  google::LogMessage::~LogMessage()
 @ 0x7fdb6115f73f  google::ErrnoLogMessage::~ErrnoLogMessage()
 @ 0x7fdb60f0b0ca  ZooKeeper::ZooKeeper()
 @ 0x7fdb60f0cce0  zookeeper::GroupProcess::initialize()
 @ 0x7fdb61094dce  process::ProcessManager::resume()
 @ 0x7fdb610955fc  process::schedule()
 @ 0x7fdb5f394b50  start_thread
 @ 0x7fdb5f0df0ed  (unknown)

 I guess I have to use directly IP address, right?


 On 23 May 2014 17:38, Vinod Kone vinodk...@gmail.com wrote:

 0.18.0 https://issues.apache.org/jira/browse/MESOS-672


 On Fri, May 23, 2014 at 8:11 AM, Tomas Barton barton.to...@gmail.comwrote:

 Hey Vinod,

 thanks! That's exactly what I was looking for. I haven't noticed that
 flag, since which version is it available?

 Tomas


 On 23 May 2014 17:03, Vinod Kone vinodk...@gmail.com wrote:

 You can use --hostname to tell master to publish a different address
 in zk.

 @vinodkone
 Sent from my mobile

  On May 23, 2014, at 12:40 AM, Tomas Barton barton.to...@gmail.com
 wrote:
 
  Hi,
 
  is it possible to run a Mesos master behind NAT? With the --ip flag
 I can set IP address of an actual interface. When master gets elected to
 Zookeeper goes an IP address which is in this case a private one. If I
 could override which IP is in Zookeeper, than I could easily forward port
 5050 to the master.
 
  Is there a way how to do this?
 
  Thanks,
  Tomas








Re: Log managment

2014-05-16 Thread Tomas Barton
Hi Damien,

do you use the `--log_dir` switch? If so, mesos is creating quite many
files in a strange format:

mesos-slave.{hostname}.invalid-user.log.INFO.20140409-155625.7545

when you forward stdout of the service to a single file and afterwards
apply simple logroate
rules, you might get nicer logs.

Tomas


On 14 May 2014 19:28, Adam Bordelon a...@mesosphere.io wrote:

 Hi Damien,

 Log rotation sounds like a reasonable request. Please file a JIRA for it,
 and we can discuss details there.

 Thanks,
 -Adam-


 On Wed, May 14, 2014 at 1:46 AM, Damien Hardy dha...@viadeoteam.comwrote:

 Hello,

 Log in mesos are problematic for me so far.
 We are used to use log4j facility in java world that permit a lot of
 things.

 Mainly I would like log rotation (ideally with logrotate tool to be
 homogeneous with other things) without restarting processes because in
 my experience it looses history ( mesos 0.16.0 so far )

 Best regards,

 --
 Damien HARDY
 IT Infrastructure Architect
 Viadeo - 30 rue de la Victoire - 75009 Paris - France
 PGP : 45D7F89A





Re: Questions about mesos-storm

2014-04-28 Thread Tomas Barton
Hi Chengwei,


| 1. Is it necessary to deploy storm nimbus on the mesos master node?

Yes, basically Nimbus runs as Mesos framework (inside Mesos master), if
Mesos master dies, it should start automatically on a different node.
It's not a standard Nimbus, it's MesosNimbus (Mesos framework):
code might look this (it's slightly outdated version):
https://github.com/nathanmarz/storm-mesos/blob/master/src/jvm/storm/mesos/MesosNimbus.java


| 2. How to deploy storm supervisor node?

No need to do that. MesosNimbus will find resources and will run supervisor
process when a topology is submitted.

Tomas

On 28 April 2014 03:49, Chengwei Yang chengwei.yang...@gmail.com wrote:

 Hi List,

 I found mesosphere has a good page
 http://mesosphere.io/learn/run-storm-on-mesos/ to guide user setup a
 mesos-storm cluster.

 However, I found myself several questions about that guide.

 1. Is it necessary to deploy storm nimbus on the mesos master node?

 The guide only says about deploy storm nimbus on the mesos master node.

 But I think it's fine to deploy storm nimbus on a different node. Am I
 right?

 2. How to deploy storm supervisor node?

 The guide doesn't say anything about deploy supervisor node, from that
 guide, I see superviosr node deploy on-demand when a topology submitted?
 Does mesos-storm do such cool stuff?

 If not, we need deploy supervisor in all mesos nodes?

 --
 Thanks,
 Chengwei