Re: How to deploy Hadoop on Mesos

2017-07-27 Thread Stephen Gran
Hi,

On 27/07/17 13:54, Traiano Welcome wrote:
> Hi Stephen
> 
> 
> On Thu, Jul 27, 2017 at 12:19 PM, Stephen Gran <stephen.g...@piksel.com> 
> wrote:
> Both spark and flink integrate natively with mesos, so no need for an
> intermediate yarn layer.  For batch work, we're looking at the aurora
> project for job scheduling.
> 
> 
> 
> I haven't looked at Aurora before - would you consider it a drop in
> replacement for hadoop for distributed batch workloads?

It's definitely not a drop in replacement - they have very different
APIs and capabilities.  What aurora gives us is a DSL to build the DAG
of an execution, and with a little work, some primitives to run those
executions.  So, the functionality ends up being similar for 'just
batch', but the language, the bindings, etc are all very different.

Cheers,
-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: How to deploy Hadoop on Mesos

2017-07-27 Thread Stephen Gran
Hi,

So typically people run two sorts of workloads on hadoop -
ad-hoc/scheduled batch work, and stream workloads (spark, flink, etc.).

Both spark and flink integrate natively with mesos, so no need for an
intermediate yarn layer.  For batch work, we're looking at the aurora
project for job scheduling.

hadoop brings some interesting things, but I've not found integration
with mesos to ever be pain-free, so we're moving to other tools instead
of continuing down the path of trying to get hadoop working with mesos.

Good luck!

On 27/07/17 08:50, Traiano Welcome wrote:
> Hi Stephen
> 
> 
> On Wed, Jul 26, 2017 at 5:18 PM, Stephen Gran <stephen.g...@piksel.com
> <mailto:stephen.g...@piksel.com>> wrote:
> 
> Hi,
> 
> It is having discussions about whether to stop, as it's having trouble
> getting enough contributors.
> 
> I guess I'd ask what you need to run on hadoop, why you're looking at
> mesos, and then see what else is in that space.
> 
> 
> 
> I don't know what we'd need to run on hadoop at this point - it's open
> ended, and for our developers to decide. However, should this make a
> difference?
> 
> We have mesos in place as a resource scheduler for a number of
> frameworks and would like to resource manage it using the same
> semantics, tools and mechanisms mesos provides.
> 
> I've looked at two books so far that show how this is done, so it seems
> this way of managing hadoop is in use in places (ref: "Apache Mesos
> Essentials", "Mastering Mesos"), however these books are probably out of
> date because the procedure they describe for integrating mesos and
> hadoop is broken.
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> Cheers,
> 
> On 26/07/17 14:13, Brandon Gulla wrote:
> > Have you looked into Apache Myriad?
> >
> > http://myriad.apache.org/
> >
> > On Wed, Jul 26, 2017 at 4:12 AM, Traiano Welcome <trai...@gmail.com 
> <mailto:trai...@gmail.com>
> > <mailto:trai...@gmail.com <mailto:trai...@gmail.com>>> wrote:
> >
> > Hi
> >
> > Would anyone know of some reliable guides to deploying  apache
> > hadoop on top of the mesos scheduler?
>     >
> > Thanks,
> > Traiano
> >
> >
> >
> >
> > --
> > Brandon
> 
> --
> Stephen Gran
> Senior Technical Architect
> 
> picture the possibilities | piksel.com <http://piksel.com>
> 
> 

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: How to deploy Hadoop on Mesos

2017-07-26 Thread Stephen Gran
Hi,

It is having discussions about whether to stop, as it's having trouble
getting enough contributors.

I guess I'd ask what you need to run on hadoop, why you're looking at
mesos, and then see what else is in that space.

Cheers,

On 26/07/17 14:13, Brandon Gulla wrote:
> Have you looked into Apache Myriad? 
> 
> http://myriad.apache.org/
> 
> On Wed, Jul 26, 2017 at 4:12 AM, Traiano Welcome <trai...@gmail.com
> <mailto:trai...@gmail.com>> wrote:
> 
> Hi
> 
> Would anyone know of some reliable guides to deploying  apache
> hadoop on top of the mesos scheduler?
> 
> Thanks,
> Traiano
> 
> 
> 
> 
> -- 
> Brandon

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: Mesos fetcher error when running as non-root user

2017-04-29 Thread Stephen Gran
Hi,

We ran into this as well, but happily you can specify docker credentials
for the mesos agent instead of using a file URI.  This works a treat and
stops putting docker credentials in the sandbox, which is a nice side
effect.

Cheers,

On 26/04/17 17:07, De Groot (CTR), Craig wrote:
> We recently upgraded from Mesos 1.1.0 to 1.2.0 and are encountering
> errors with code that previously worked in 1.1.0.  I believe that this
> is a bug in the new version.  If not, I would like to know the correct
> procedure for using the sandbox as a user other than root.
> 
> Here is the scenario:
> 1) Setup a job in Marathon which specifies a URI to our private
> docker.tar.gz
>   - See: this for an example
> ... 
> https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html
> <https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html>
>   - This is a local file on each node
> 
> 2) Specify a User (other than root) in the Marathon UI
> 
> 3) Mesos will try to fetch the file and fails during the copy because
> the ownership of the sandbox directory are not changed to the specified
> user.
>   - Note that 1.1.0 correctly set the sandbox directory to the specified
> user
>   - This behavior is documented in the Mesos Docs here (see "specifying
> a user name"):  http://mesos.apache.org/documentation/latest/fetcher/
> <http://mesos.apache.org/documentation/latest/fetcher/>
> 
> Thanks in advance for the help!
> 
> __
> Craig De Groot
> 
> 

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: cron-like scheduling in mesos framework?

2017-01-10 Thread Stephen Gran
Hi,

This is what I'm trying to say, perhaps not very clearly:
the task 'ls /' will be run in a 4th container.  It is a new process.

Cheers,

On 10/01/17 15:54, l vic wrote:
> Great, but I have several servers: 3 different containers running on 3
> different nodes. If i try to schedule the job :
> curl -L -H 'Content-Type: application/json' -X POST
> http://|chronos-node:|2002/scheduler/iso8601
> <http://10.1.122.1:2002/scheduler/iso8601> -d '
> {
>   "schedule": "R10/2015-06-05T13:36:00Z/PT1M",
>   "name": "test-container-job",
>  "container": {
>   "type": "DOCKER",
>   "image": "",
>   "network": "BRIDGE"
>   },
>   "cpus": "0.1",
>   "mem": "512",
>  "uris" : ["file:///.dockercfg"],
>   "command": "ls /"
> }'
> Will it schedule the job on all 3 nodes, or only the first one? Can I
> actually choose which one in my json?
> Thks
>
> On Tue, Jan 10, 2017 at 10:47 AM, Stephen Gran <stephen.g...@piksel.com
> <mailto:stephen.g...@piksel.com>> wrote:
>
> Hi,
>
> Chronos launches a new task to perform the job.  Think of it more like
> cron launching a new process when this is run on a single server.
>
> Cheers,
>
> On 10/01/17 15:31, l vic wrote:
> > If I have cluster of 3 running containers - how Chronos will select one
> > to schedule the job?
> >
> > On Fri, Jan 6, 2017 at 1:33 PM, Radek Gruchalski <ra...@gruchalski.com 
> <mailto:ra...@gruchalski.com>
> > <mailto:ra...@gruchalski.com <mailto:ra...@gruchalski.com>>> wrote:
> >
> > Chronos launches a task on Mesos whenever a scheduled event is bound
> > to happen.
> > You can run a container task with Chronos, n problem:
> >
> > https://mesos.github.io/chronos/docs/api.html#job-configuration
> <https://mesos.github.io/chronos/docs/api.html#job-configuration>
> > <https://mesos.github.io/chronos/docs/api.html#job-configuration
> <https://mesos.github.io/chronos/docs/api.html#job-configuration>>
> >
> > Check the container field.
> >
> > –
> > Best regards,

> > Radek Gruchalski
> > ra...@gruchalski.com <mailto:ra...@gruchalski.com>
> <mailto:ra...@gruchalski.com <mailto:ra...@gruchalski.com>>
> >
> >
> > On January 6, 2017 at 7:17:36 PM, l vic (lvic4...@gmail.com 
> <mailto:lvic4...@gmail.com>
> > <mailto:lvic4...@gmail.com <mailto:lvic4...@gmail.com>>) wrote:
> >
> >> Chronos is a separate service... I have database cluster running
> >> as Mesos framework. I have to  periodically schedule data backups
> >> using shell scripts within running docker container. Is that
> >> possible with Chronos?
> >>
> >> On Fri, Jan 6, 2017 at 11:47 AM, Radek Gruchalski
> >> <ra...@gruchalski.com <mailto:ra...@gruchalski.com>
> <mailto:ra...@gruchalski.com <mailto:ra...@gruchalski.com>>> wrote:
> >>
> >> There are two options:
> >>
> >> - https://github.com/dcos/metronome 
> <https://github.com/dcos/metronome>
> >> <https://github.com/dcos/metronome 
> <https://github.com/dcos/metronome>>
> >> - https://github.com/mesos/chronos 
> <https://github.com/mesos/chronos>
> >> <https://github.com/mesos/chronos 
> <https://github.com/mesos/chronos>>
> >>
> >>     while Metronome, apparently, being a successor of Chronos.
> >>
> >> –
> >> Best regards,

> >> Radek Gruchalski
> >> ra...@gruchalski.com <mailto:ra...@gruchalski.com>
> <mailto:ra...@gruchalski.com <mailto:ra...@gruchalski.com>>
> >>
> >>
> >> On January 6, 2017 at 5:40:45 PM, l vic (lvic4...@gmail.com 
> <mailto:lvic4...@gmail.com>
> >> <mailto:lvic4...@gmail.com <mailto:lvic4...@gmail.com>>)
> wrote:
> >>
> >>> Hi,
> >>> Is there a way to schedule mesos framework task for
> execution
> >>> at certain day/time?
> >>> Thank youm
> >>> -V
> >>
> >>
> >
>
> --
> Stephen Gran
> Senior Technical Architect
>
> picture the possibilities | piksel.com <http://piksel.com>
>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: cron-like scheduling in mesos framework?

2017-01-10 Thread Stephen Gran
Hi,

Chronos launches a new task to perform the job.  Think of it more like 
cron launching a new process when this is run on a single server.

Cheers,

On 10/01/17 15:31, l vic wrote:
> If I have cluster of 3 running containers - how Chronos will select one
> to schedule the job?
>
> On Fri, Jan 6, 2017 at 1:33 PM, Radek Gruchalski <ra...@gruchalski.com
> <mailto:ra...@gruchalski.com>> wrote:
>
> Chronos launches a task on Mesos whenever a scheduled event is bound
> to happen.
> You can run a container task with Chronos, n problem:
>
> https://mesos.github.io/chronos/docs/api.html#job-configuration
> <https://mesos.github.io/chronos/docs/api.html#job-configuration>
>
> Check the container field.
>
> –
> Best regards,

> Radek Gruchalski
> 
ra...@gruchalski.com <mailto:ra...@gruchalski.com>
>
>
> On January 6, 2017 at 7:17:36 PM, l vic (lvic4...@gmail.com
> <mailto:lvic4...@gmail.com>) wrote:
>
>> Chronos is a separate service... I have database cluster running
>> as Mesos framework. I have to  periodically schedule data backups
>> using shell scripts within running docker container. Is that
>> possible with Chronos?
>>
>> On Fri, Jan 6, 2017 at 11:47 AM, Radek Gruchalski
>> <ra...@gruchalski.com <mailto:ra...@gruchalski.com>> wrote:
>>
>> There are two options:
>>
>> - https://github.com/dcos/metronome
>> <https://github.com/dcos/metronome>
>> - https://github.com/mesos/chronos
>> <https://github.com/mesos/chronos>
>>
>> while Metronome, apparently, being a successor of Chronos.
>>
>> –
>> Best regards,

>> Radek Gruchalski
>> 
ra...@gruchalski.com <mailto:ra...@gruchalski.com>
>>
>>
>> On January 6, 2017 at 5:40:45 PM, l vic (lvic4...@gmail.com
>> <mailto:lvic4...@gmail.com>) wrote:
>>
>>> Hi,
>>> Is there a way to schedule mesos framework task for execution
>>> at certain day/time?
>>> Thank youm
>>> -V
>>
>>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: Proposal: mesosadm, the command to bootstrap the mesos cluster.

2016-12-13 Thread Stephen Gran
Hi,

I'm quite happy with the current approach of bootstrapping a new agent 
with the location of zookeeper and a set of credentials.  This allows 
our automation code to make new agents join the cluster automatically.

Not that I'm opposed to the two step process you propose, I'm sure we 
can make that happen automatically as well, but aside from making mesos 
look more like other solutions, does it bring semantics that would be 
useful?  ie, are there actions that 'mesosadm init' would initiate?  Or 
would this be purely an interactive way to do the same things you can do 
now by seeding out config files?

Cheers,

On 13/12/16 05:14, tommy xiao wrote:
> Hi team,
>
>
> I came from china mesos community. in today's group discussion, we came
> across a topic: Howto enhance user's cluster experience?
>
> Because newcome user is top resource for a community. if we can enhance
> currently mesos cluster installation steps, it will help us fastly
> bootstrap in user community.
>
> why mesosadm?
>
> such as Swarm cluster setup steps:
>
> 1. docker init
> 2. docker join
>
> another kuberenetes 1.5 cluster setup steps:
>
> 1. kubeadm init
> 2. kubeadm join --token  
>
> So i think the init, join style is good experience for normal user. How
> about you think?
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com <http://gmail.com>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: Fwd: Unable to run spark examples on mesos 1.0

2016-08-05 Thread Stephen Gran
Hi,

You'll need to get a working hadoop install before that works.  Try 
adding JAVA_HOME and so forth to hadoop/libexec/hadoop-layout.sh

Cheers,

On 04/08/16 20:28, max square wrote:
> Hey guys ,
> I was trying out spark 2.0 examples to run on mesos+hadoop cluster but
> it keep failing with the following error message:-
>
> I0803 19:46:53.848696 12494 fetcher.cpp:498] Fetcher Info:
> 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/587226cc-bece-422a-bb93-e3ef49075642-S1\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/testcluster\/spark-examples_2.11-2.0.0.jar"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"hdfs:\/\/testcluster\/spark-2.0.0-bin-hdfs-2.6.0-cdh5.7.1.tgz"}}],"sandbox_directory":"\/vol\/mesos\/data\/slaves\/587226cc-bece-422a-bb93-e3ef49075642-S1\/frameworks\/587226cc-bece-422a-bb93-e3ef49075642-0017\/executors\/driver-20160803194649-0001\/runs\/b1e9a92e-f004-4cdc-b936-52b32593d39f","user":"root"}
>
> I0803 19:46:53.850719 12494 fetcher.cpp:409] Fetching URI
> 'hdfs://testcluster/spark-examples_2.11-2.0.0.jar'
>
> I0803 19:46:53.850731 12494 fetcher.cpp:250] Fetching directly into
> the sandbox directory
>
> I0803 19:46:53.850746 12494 fetcher.cpp:187] Fetching URI
> 'hdfs://testcluster/spark-examples_2.11-2.0.0.jar'
> E0803 19:46:53.860776 12494 shell.hpp:106] Command
> '/usr/lib/hadoop/bin/hadoop version 2>&1' failed; this is the output:
> Error: JAVA_HOME is not set and could not be found.
> Failed to fetch 'hdfs://testcluster/spark-examples_2.11-2.0.0.jar':
> Failed to create HDFS client: Failed to execute
> '/usr/lib/hadoop/bin/hadoop version 2>&1'; the command was either
> not found or exited with a non-zero exit status: 1
> Failed to synchronize with agent (it's probably exited)
>
>
> To start out, I tried out the hadoop command which was giving the error
> on the agents and was able to replicate the error. So basically, running
> "sudo -u root /usr/lib/hadoop/bin/hadoop version 2>&1" gave me the same
> JAVA_HOME not set error. After I fixed that and restarted the agents,
> running the spark example still gave me the same error.
>
> I ran the same examples on mesos 0.28.2, and it ran fine.
>
> Any help regarding this would be appreciated.
>
> *Additional Info :-*
> mesos version - 1.0.0
> hadoop version - 2.6.0-cdh5.7.2
> spark version - 2.0.0
>
> Command used to run spark example - ./bin/spark-submit --class
> org.apache.spark.examples.SparkPi --master mesos://:7077
> --deploy-mode cluster --executor-memory 2G --total-executor-cores 4
> hdfs://testcluster/spark-examples_2.11-2.0.0.jar 100
>
>
>
>
>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Stephen Gran
Hi,

Run ldconfig after install?

Cheers,

On 06/06/16 13:21, Kamil Wokitajtis wrote:
> For 0.28.1:
> root@pltr-app-pl01:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.1.so <http://libmesos-0.28.1.so/> =>
> /usr/local/lib/libmesos-0.28.1.so
> <http://libmesos-0.28.1.so/> (0x7fbf522b6000)
>
> For 0.28.2, as expected, library not found:
> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.2.so <http://libmesos-0.28.2.so/> => not found
>
> After exporting LD_LIBRARY_PATH, libmesos is found as expected:
> root@pltr-app-pl02:~#export LD_LIBRARY_PATH=/usr/local/lib
> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep mesos
> libmesos-0.28.2.so <http://libmesos-0.28.2.so/> =>
> /usr/local/lib/libmesos-0.28.2.so
> <http://libmesos-0.28.2.so/> (0x7f219a8c1000)
>
> Looks like exporting LD_LIBRARY_PATH should help. But exporting this
> variable in mesos startup scripts doesn't seem to solve issue.
>
>
> 2016-06-06 13:50 GMT+02:00 Guangya Liu <gyliu...@gmail.com
> <mailto:gyliu...@gmail.com>>:
>
> You can check what is the output of `ldd mesos-docker-executor` to
> see if the libmesos-xxx is in the right location.
>
> [root@dcos001 mesosphere]# ldd
> 
> ./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.1.so <http://libmesos-0.28.1.so> =>
> 
> /opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/libmesos-0.28.1.so
> <http://libmesos-0.28.1.so> (0x7f5cf1cb1000)
>
> Thanks,
>
> Guangya
>
> On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis
> <wokitaj...@gmail.com <mailto:wokitaj...@gmail.com>> wrote:
>
> Hi,
>
> I have upgraded my mesos env from 0.28.1 to 0.28.2.
> On 0.28.1 everything worked just fine.
> Now agents are unable to start docker images, mesos throws:
>
> mesos-docker-executor: error while loading shared libraries:
> libmesos-0.28.2.so <http://libmesos-0.28.2.so/>: cannot open
> shared object file: No such  file or directory
>
> Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
> /usr/local/lib
> There is also symlink libmesos.so -> libmesos-0.28.2.
> /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
> I have also tried exporting LD_LIBRARY_PATH in startup scripts,
> no luck.
>
> Thanks,
> Kamil
>
>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: Rack awareness support for Mesos

2016-06-06 Thread Stephen Gran
Hi,

This looks potentially interesting.  How does it work in a public cloud 
deployment scenario?  I assume you would just have to disable this 
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a while,
>
> it’s a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: Marathon scaling application

2016-05-11 Thread Stephen Gran
b10707031':
> exit status = exited with status 1 stderr = Cannot connect to the Docker
> daemon. Is the docker daemon running on this host?
>
> E0511 05:39:43.652032  1351 slave.cpp:3252] Failed to update resources
> for container f77a5a14-4eb0-4801-a520-6fd2b298a3e3 of executor
> 'nginx.47bcdb9a-169b-11e6-9f8a-fa163ecc33f1' running task
> nginx.47bcdb9a-169b-11e6-9f8a-fa163ecc33f1 on status update for terminal
> task, destroying container: Failed to 'docker -H
> unix:///var/run/docker.sock inspect
> mesos-f986e4ba-91ba-4624-b685-4c004407c6db-S1.f77a5a14-4eb0-4801-a520-6fd2b298a3e3':
> exit status = exited with status 1 stderr = Cannot connect to the Docker
> daemon. Is the docker daemon running on this host?
>
> E0511 05:39:43.652118  1351 slave.cpp:3252] Failed to update resources
> for container 537100e5-99de-4b59-903a-127dae29839e of executor
> 'nginx.47bf4c9b-169b-11e6-9f8a-fa163ecc33f1' running task
> nginx.47bf4c9b-169b-11e6-9f8a-fa163ecc33f1 on status update for terminal
> task, destroying container: Failed to 'docker -H
> unix:///var/run/docker.sock inspect
> mesos-f986e4ba-91ba-4624-b685-4c004407c6db-S1.537100e5-99de-4b59-903a-127dae29839e':
> exit status = exited with status 1 stderr = Cannot connect to the Docker
> daemon. Is the docker daemon running on this host?
>
> E0511 05:39:43.679261  1352 process.cpp:1958] Failed to shutdown socket
> with fd 18: Transport endpoint is not connected
>
> E0511 05:39:43.780983  1352 process.cpp:1958] Failed to shutdown socket
> with fd 14: Transport endpoint is not connected
>
> *From:*Ken Sipe [mailto:kens...@gmail.com]
> *Sent:* 11 May 2016 13:50
> *To:* user@mesos.apache.org
> *Subject:* Re: Marathon scaling application
>
> It is hard to say with the information provided.   I would check the
> slave log the failure node.  I suspect the failure is recorded there.
>
> otherwise more information is necessary:
>
> 1. the marathon job (did you launch with a json file? that would be helpful)
>
> 2. the slave logs
>
> it could also be useful to understand:
>
> 1. the version of mesos and marathon
>
> 2. what OS is on the nodes
>
> ken
>
> On May 11, 2016, at 3:10 AM, suruchi.kum...@accenture.com
> <mailto:suruchi.kum...@accenture.com> wrote:
>
> I have problem scaling the applications through Marathon.
>
> I have a setup of two slave nodes.The first slave node having CPU=1
> and RAM=2GB and the Second node having CPU=4 and RAM=8GB.
>
> It is able to scale maximum 5 instances on the first node but  when
> I tried scaling it further the host gets changed to the second slave
> node.And the task fails to start and error in the debug section of
> the Marathon UI shows "Abnormal executor termination".
>
> I would like to know why is it not getting scheduled on the other
> slave node???
>
> Can you please help me with this issue.
>
> Thanks
>
> 
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If
> you have received it in error, please notify the sender immediately
> and delete the original. Any other use of the e-mail by you is
> prohibited. Where allowed by local law, electronic communications
> with Accenture and its affiliates, including e-mail and instant
> messaging (including content), may be scanned by our systems for the
> purposes of information security and assessment of internal
> compliance with Accenture policy.
> 
> __
>
> www.accenture.com <http://www.accenture.com/>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: Detecting resource issues

2016-04-30 Thread Stephen Gran
Hi,

I will raise it over there as well, but it's also a mesos question - I'd 
like to detect this sort of issue, and it seems like it should be 
possible.  I'm just looking to see if anyone has already done this and 
can point me in the right direction.

Ultimately, we are going to be running several frameworks on mesos, and 
it seems like the right thing to detect that frameworks are not getting 
offers that they can accept in one place rather than several.

Cheers,

On 29/04/16 17:54, Vinod Kone wrote:
> This sounds like a feature request for marathon. Can you redirect this
> to the marathon mailing list?
>
> On Fri, Apr 29, 2016 at 9:26 AM, Stephen Gran <stephen.g...@piksel.com
> <mailto:stephen.g...@piksel.com>> wrote:
>
> Hello,
>
> We're running tasks on mesos, launched with marathon.  We label all the
> agents with AWS availability zone and VPC name, so that tasks can be
> scheduled to the right set of hosts.
>
> I've noticed something that feels like, well, maybe not a bug, but
> unexpected behavior.
>
> We launch tasks with:
>
>  "constraints": [
>  [
>  "az",
>  "GROUP_BY",
>  "3"
>  ],
>  ],
>  "instances": 2,
>
> this is eu-west-1, where there are 3 AZs.  We run agents in all 3 AZs.
>
> On trying to restart an application, no new task was started.  Digging
> around, I could see marathon decline any offers from mesos, which led us
> to look a little closer.  It turned out that the 2 tasks in the
> application were running in eu-west-1a and eu-west-1b.  All the agents
> in eu-west-1c were fully subscribed and could not pick up any new work.
>
> Once we figured this out, it was straight forward enough to rebalance
> and let things sort themselves out.
>
> So, with that as background:
>
> It would have been nicer if marathon had realized that the state at the
> start and the end of the transaction would be to run in only 2 of 3 AZs,
> and allowed a new task to start in either eu-west-1a or eu-west-1b.  I
> can see how that might be slightly harder to account for than just even
> stacking.
>
> It would be nice if a metric "a framework keeps asking for resource and
> then declining offers" was available - it may already be, but I can't
> find it.  This would at least make the issue visible.
>
> I can see the metric for declined offers, but this also increments when
> the framework declines offers because it doesn't need any additional
> resource, so I'm not sure if it's helpful or not here.  Perhaps I need
> to look at a second order derivative to see spikes in declines?  It does
> look like the number of declines went way up during this period.
>
> Like I said, I don't know if this is a bug, precisely, but it was a not
> very visible failure to use resource, when there were actually plenty of
> resources on offer.  I'd like to make these failures more visible to the
> team, so any pointers would be helpful.
>
> Cheers,
>
> --
> Stephen Gran
> Senior Technical Architect
>
> picture the possibilities | piksel.com <http://piksel.com>
>
> This message is private and confidential. If you have received this
> message in error, please notify the sender or serviced...@piksel.com
> <mailto:serviced...@piksel.com> and remove it from your system.
>
> Piksel Inc is a company registered in the United States New York
> City, 1250 Broadway, Suite 1902, New York, NY 10001. F No. = 2931986
>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Detecting resource issues

2016-04-29 Thread Stephen Gran
Hello,

We're running tasks on mesos, launched with marathon.  We label all the
agents with AWS availability zone and VPC name, so that tasks can be
scheduled to the right set of hosts.

I've noticed something that feels like, well, maybe not a bug, but
unexpected behavior.

We launch tasks with:

"constraints": [
[
"az",
"GROUP_BY",
"3"
],
],
"instances": 2,

this is eu-west-1, where there are 3 AZs.  We run agents in all 3 AZs.

On trying to restart an application, no new task was started.  Digging
around, I could see marathon decline any offers from mesos, which led us
to look a little closer.  It turned out that the 2 tasks in the
application were running in eu-west-1a and eu-west-1b.  All the agents
in eu-west-1c were fully subscribed and could not pick up any new work.

Once we figured this out, it was straight forward enough to rebalance
and let things sort themselves out.

So, with that as background:

It would have been nicer if marathon had realized that the state at the
start and the end of the transaction would be to run in only 2 of 3 AZs,
and allowed a new task to start in either eu-west-1a or eu-west-1b.  I
can see how that might be slightly harder to account for than just even
stacking.

It would be nice if a metric "a framework keeps asking for resource and
then declining offers" was available - it may already be, but I can't
find it.  This would at least make the issue visible.

I can see the metric for declined offers, but this also increments when
the framework declines offers because it doesn't need any additional
resource, so I'm not sure if it's helpful or not here.  Perhaps I need
to look at a second order derivative to see spikes in declines?  It does
look like the number of declines went way up during this period.

Like I said, I don't know if this is a bug, precisely, but it was a not
very visible failure to use resource, when there were actually plenty of
resources on offer.  I'd like to make these failures more visible to the
team, so any pointers would be helpful.

Cheers,

--
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com

This message is private and confidential. If you have received this message in 
error, please notify the sender or serviced...@piksel.com and remove it from 
your system.

Piksel Inc is a company registered in the United States New York City, 1250 
Broadway, Suite 1902, New York, NY 10001. F No. = 2931986