"Chaos monkey" for mesos?

2016-02-25 Thread Srikanth Viswanathan
Has there been any work done to develop a "chaos monkey
" analogue for
Mesos? I have been researching on how to write one, but I wanted to know if
there's any work already available that I can take a look at for
comparison, and possibly re-use.

The end goal would be something loaded into Mesos or separate from Mesos
that randomly kills tasks. Could it be something as simple as an
application that uses the KILL HTTP request from the scheduler API to kill
tasks?

Thanks.

Srikanth


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Srikanth Viswanathan
Thanks. Craig and David. I'm curious about the design and use of that tool.
Based on the video, it looks close to what I hope to do.

A web search didn't yield any results about it, however. Does anyone here
know more about the dcos chaos tool?

Thanks again.
Srikanth

On Thu, Feb 25, 2016 at 12:21 PM, craig w <codecr...@gmail.com> wrote:

> here's a direct link in the video
> https://youtu.be/0I6qG9RQUnY?t=389
>
> On Thu, Feb 25, 2016 at 12:17 PM, David Wood <daw...@us.ibm.com> wrote:
>
>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
>> sure if that's what your looking for, but it might be something to follow
>> up on somehow.
>>
>> https://mesosphere.com/learn/
>>
>> David Wood
>> Computing Systems for Wireless Networks
>> IBM TJ Watson Research Center
>> daw...@us.ibm.com
>> 914-945-4923 (office), 914-396-6515 (mobile)
>>
>>
>>
>>
>> From:Srikanth Viswanathan <srikant...@gmail.com>
>> To:user@mesos.apache.org
>> Date:02/25/2016 12:01 PM
>> Subject:"Chaos monkey" for mesos?
>> --
>>
>>
>>
>> Has there been any work done to develop a "*chaos monkey*
>> <https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey>" analogue for
>> Mesos? I have been researching on how to write one, but I wanted to know if
>> there's any work already available that I can take a look at for
>> comparison, and possibly re-use.
>>
>> The end goal would be something loaded into Mesos or separate from Mesos
>> that randomly kills tasks. Could it be something as simple as an
>> application that uses the KILL HTTP request from the scheduler API to kill
>> tasks?
>>
>> Thanks.
>>
>> Srikanth
>>
>>
>
>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
>


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Srikanth Viswanathan
Appreciate all the responses here. I'll look into `mesos-execute`.

I was thinking about the framework idea in passing but my mesos knowledge
isn't up to scratch yet, so I haven't been able pursue it yet. There are
many questions in my mind w.r.t designing this as a framework:
* Doesn't a framework only receive offers from mesos and launch tasks? How
would a framework kill tasks? Can it also kill slaves?
* Is it legal in mesos for one framework to kill tasks belonging to another
framework?

Thanks.
Srikanth

On Thu, Feb 25, 2016 at 4:58 PM, Connor Doyle <connor@gmail.com> wrote:

> I think you could approximate that tool's behavior with some scripting
> plus `mesos-execute` (ships with the distribution) or by writing a
> really simple framework that just turns things off.
>
> On Thu, Feb 25, 2016 at 1:14 PM, Srikanth Viswanathan
> <srikant...@gmail.com> wrote:
> > Thanks. Craig and David. I'm curious about the design and use of that
> tool.
> > Based on the video, it looks close to what I hope to do.
> >
> > A web search didn't yield any results about it, however. Does anyone here
> > know more about the dcos chaos tool?
> >
> > Thanks again.
> > Srikanth
> >
> > On Thu, Feb 25, 2016 at 12:21 PM, craig w <codecr...@gmail.com> wrote:
> >>
> >> here's a direct link in the video
> >> https://youtu.be/0I6qG9RQUnY?t=389
> >>
> >> On Thu, Feb 25, 2016 at 12:17 PM, David Wood <daw...@us.ibm.com> wrote:
> >>>
> >>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
> >>> sure if that's what your looking for, but it might be something to
> follow up
> >>> on somehow.
> >>>
> >>> https://mesosphere.com/learn/
> >>>
> >>> David Wood
> >>> Computing Systems for Wireless Networks
> >>> IBM TJ Watson Research Center
> >>> daw...@us.ibm.com
> >>> 914-945-4923 (office), 914-396-6515 (mobile)
> >>>
> >>>
> >>>
> >>>
> >>> From:Srikanth Viswanathan <srikant...@gmail.com>
> >>> To:user@mesos.apache.org
> >>> Date:02/25/2016 12:01 PM
> >>> Subject:"Chaos monkey" for mesos?
> >>> 
> >>>
> >>>
> >>>
> >>> Has there been any work done to develop a "chaos monkey" analogue for
> >>> Mesos? I have been researching on how to write one, but I wanted to
> know if
> >>> there's any work already available that I can take a look at for
> comparison,
> >>> and possibly re-use.
> >>>
> >>> The end goal would be something loaded into Mesos or separate from
> Mesos
> >>> that randomly kills tasks. Could it be something as simple as an
> application
> >>> that uses the KILL HTTP request from the scheduler API to kill tasks?
> >>>
> >>> Thanks.
> >>>
> >>> Srikanth
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> https://github.com/mindscratch
> >> https://www.google.com/+CraigWickesser
> >> https://twitter.com/mind_scratch
> >> https://twitter.com/craig_links
> >
> >
>
>
>
> --
> connor
>


Re: "Chaos monkey" for mesos?

2016-02-25 Thread Srikanth Viswanathan
Sorry, ignore my first question. A framework can obviously kill tasks. I
was just unsure as to whether it can kill foreign tasks, which leaves only
my second question.

On Thu, Feb 25, 2016 at 5:23 PM, Srikanth Viswanathan <srikant...@gmail.com>
wrote:

> Appreciate all the responses here. I'll look into `mesos-execute`.
>
> I was thinking about the framework idea in passing but my mesos knowledge
> isn't up to scratch yet, so I haven't been able pursue it yet. There are
> many questions in my mind w.r.t designing this as a framework:
> * Doesn't a framework only receive offers from mesos and launch tasks? How
> would a framework kill tasks? Can it also kill slaves?
> * Is it legal in mesos for one framework to kill tasks belonging to
> another framework?
>
> Thanks.
> Srikanth
>
> On Thu, Feb 25, 2016 at 4:58 PM, Connor Doyle <connor@gmail.com>
> wrote:
>
>> I think you could approximate that tool's behavior with some scripting
>> plus `mesos-execute` (ships with the distribution) or by writing a
>> really simple framework that just turns things off.
>>
>> On Thu, Feb 25, 2016 at 1:14 PM, Srikanth Viswanathan
>> <srikant...@gmail.com> wrote:
>> > Thanks. Craig and David. I'm curious about the design and use of that
>> tool.
>> > Based on the video, it looks close to what I hope to do.
>> >
>> > A web search didn't yield any results about it, however. Does anyone
>> here
>> > know more about the dcos chaos tool?
>> >
>> > Thanks again.
>> > Srikanth
>> >
>> > On Thu, Feb 25, 2016 at 12:21 PM, craig w <codecr...@gmail.com> wrote:
>> >>
>> >> here's a direct link in the video
>> >> https://youtu.be/0I6qG9RQUnY?t=389
>> >>
>> >> On Thu, Feb 25, 2016 at 12:17 PM, David Wood <daw...@us.ibm.com>
>> wrote:
>> >>>
>> >>> The DCOS tutorial mentions a chaos tool at the end of the video.  Not
>> >>> sure if that's what your looking for, but it might be something to
>> follow up
>> >>> on somehow.
>> >>>
>> >>> https://mesosphere.com/learn/
>> >>>
>> >>> David Wood
>> >>> Computing Systems for Wireless Networks
>> >>> IBM TJ Watson Research Center
>> >>> daw...@us.ibm.com
>> >>> 914-945-4923 (office), 914-396-6515 (mobile)
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> From:Srikanth Viswanathan <srikant...@gmail.com>
>> >>> To:user@mesos.apache.org
>> >>> Date:02/25/2016 12:01 PM
>> >>> Subject:"Chaos monkey" for mesos?
>> >>> 
>> >>>
>> >>>
>> >>>
>> >>> Has there been any work done to develop a "chaos monkey" analogue for
>> >>> Mesos? I have been researching on how to write one, but I wanted to
>> know if
>> >>> there's any work already available that I can take a look at for
>> comparison,
>> >>> and possibly re-use.
>> >>>
>> >>> The end goal would be something loaded into Mesos or separate from
>> Mesos
>> >>> that randomly kills tasks. Could it be something as simple as an
>> application
>> >>> that uses the KILL HTTP request from the scheduler API to kill tasks?
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Srikanth
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> https://github.com/mindscratch
>> >> https://www.google.com/+CraigWickesser
>> >> https://twitter.com/mind_scratch
>> >> https://twitter.com/craig_links
>> >
>> >
>>
>>
>>
>> --
>> connor
>>
>
>


MesosCon Hackathon agenda

2017-07-25 Thread Srikanth Viswanathan
Hi folks,

Are there any details on what the MesosCon North America 2017 hackathon
will be about? Trying to decide whether to register for the hackathon.


In the past, there have been hackathon's with very specific agendas, as
well as more open hackathons that allow beginner contributions to mesos.


Thanks,
Srikanth


Dynamic reservations without a principal

2017-07-04 Thread Srikanth Viswanathan
Hi folks,

I am trying to have the Chronos framework consume dynamic reservations in
Mesos. However, it appears that Chronos is unable to do this because it
does not pass the framework principal to Mesos when launching tasks (See
https://github.com/mesos/chronos/issues/843), which makes Mesos reject the
launch operation.

To get around this, I am considering changing my dynamic reservations to be
purely role-based instead of (role, principal)-based. Is this
allowed/valid? http://mesos.readthedocs.io/en/0.24.1/reservation/#
dynamic-reservation-since-0230 says "resources are reserved for a *role*." Does
this mean I can make a dynamic reservation just for (role) instead of
(role, principal)?

Thanks,
Srikanth


Re: mesos-slave Failed to initialize: Failed to bind on 0.0.0.0:0: Address already in use: Address already in use [98]

2018-05-03 Thread Srikanth Viswanathan
fwiw, I've seen this type of error in the past when the system runs out of
ephemeral ports. Not saying this definitely the same issue, but I suggest
checking to see if you have ephemeral ports available.

On Thu, May 3, 2018 at 8:57 AM, Zhitao Li  wrote:

> Can you paste the command line of how you started the Mesos agent process?
>
> On Wed, May 2, 2018 at 9:21 PM, Luke Adolph  wrote:
>
>> Hi all:
>> When mesos slave run task, the stderr file shows
>> I0503 04:01:20.488590  9110 logging.cpp:188] INFO level logging started!
>> I0503 04:01:20.489073  9110 fetcher.cpp:424] Fetcher Info:
>> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/2bcc032f-
>> 950b-4c36-bff4-b5552c193dc9-S1\/root","items":[{"action":"
>> BYPASS_CACHE","uri":{"extract":true,"value":"file:\/\/\/etc\
>> /.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/
>> slaves\/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1\/docker\/
>> links\/b4eabcbb-5769-49f0-9324-b25c3cda8b8c","user":"root"}
>> I0503 04:01:20.491297  9110 fetcher.cpp:379] Fetching URI
>> 'file:///etc/.dockercfg'
>> I0503 04:01:20.491325  9110 fetcher.cpp:250] Fetching directly into the
>> sandbox directory
>> I0503 04:01:20.491348  9110 fetcher.cpp:187] Fetching URI
>> 'file:///etc/.dockercfg'
>> I0503 04:01:20.491367  9110 fetcher.cpp:167] Copying resource with
>> command:cp '/etc/.dockercfg' '/tmp/mesos/slaves/2bcc032f-95
>> 0b-4c36-bff4-b5552c193dc9-S1/docker/links/b4eabcbb-5769-49f
>> 0-9324-b25c3cda8b8c/.dockercfg'
>> W0503 04:01:20.495400  9110 fetcher.cpp:272] Copying instead of
>> extracting resource from URI with 'extract' flag, because it does not seem
>> to be an archive: file:///etc/.dockercfg
>> I0503 04:01:20.495728  9110 fetcher.cpp:456] Fetched
>> 'file:///etc/.dockercfg' to '/tmp/mesos/slaves/2bcc032f-95
>> 0b-4c36-bff4-b5552c193dc9-S1/docker/links/b4eabcbb-5769-49f
>> 0-9324-b25c3cda8b8c/.dockercfg'
>> F0503 04:01:21.990416  9202 process.cpp:889] Failed to initialize: Failed
>> to bind on 0.0.0.0:0: Address already in use: Address already in use [98]
>> *** Check failure stack trace: ***
>> @ 0x7f95fc6ef86d  google::LogMessage::Fail()
>> @ 0x7f95fc6f169d  google::LogMessage::SendToLog()
>> @ 0x7f95fc6ef45c  google::LogMessage::Flush()
>> @ 0x7f95fc6ef669  google::LogMessage::~LogMessage()
>> @ 0x7f95fc6f05d2  google::ErrnoLogMessage::~ErrnoLogMessage()
>> @ 0x7f95fc6955d9  process::initialize()
>> @ 0x7f95fc696be2  process::ProcessBase::ProcessBase()
>> @   0x430e9a  mesos::internal::docker::Docke
>> rExecutorProcess::DockerExecutorProcess()
>> @   0x41916b  main
>> @ 0x7f95fa60ff45  (unknown)
>> @   0x419c77  (unknown)
>>
>> When mesos slave initialize, it runs into "Failed to bind on 0.0.0.0:0:
>> Address already in use", I run `netstat -nlp`, But there is no port "0" is
>> used, full output is
>> root@10:~# netstat -nlp
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address   Foreign Address
>>  State   PID/Program name
>> tcp0  0 0.0.0.0:22  0.0.0.0:*
>>  LISTEN  1153/sshd
>> tcp0  0 0.0.0.0:37786   0.0.0.0:*
>>  LISTEN  20042/mesos-docker-
>> tcp0  0 0.0.0.0:50510.0.0.0:*
>>  LISTEN  12701/mesos-slave
>> tcp0  0 0.0.0.0:37084   0.0.0.0:*
>>  LISTEN  19765/mesos-docker-
>> tcp0  0 0.0.0.0:24220   0.0.0.0:*
>>  LISTEN  28584/ruby
>> tcp0  0 0.0.0.0:87650.0.0.0:*
>>  LISTEN  28353/nginx
>> tcp0  0 0.0.0.0:24224   0.0.0.0:*
>>  LISTEN  28584/ruby
>> tcp0  0 127.0.0.1:24225 0.0.0.0:*
>>  LISTEN  28584/ruby
>> tcp0  0 0.0.0.0:46690   0.0.0.0:*
>>  LISTEN  28932/mesos-docker-
>> tcp0  0 0.0.0.0:42437   0.0.0.0:*
>>  LISTEN  32184/mesos-docker-
>> tcp0  0 0.0.0.0:34695   0.0.0.0:*
>>  LISTEN  25862/mesos-docker-
>> tcp0  0 0.0.0.0:37039   0.0.0.0:*
>>  LISTEN  21273/mesos-docker-
>> tcp0  0 0.0.0.0:46001   0.0.0.0:*
>>  LISTEN  710/mesos-docker-ex
>> tcp6   0  0 :::31765:::*
>> LISTEN  20160/docker-proxy
>> tcp6   0  0 :::31605:::*
>> LISTEN  20149/docker-proxy
>> tcp6   0  0 :::31327:::*
>> LISTEN  820/docker-proxy
>> tcp6   0  0 :::31008:::*
>> LISTEN  32291/docker-proxy
>> tcp6   0  0 :::2375 :::*
>> LISTEN  28305/node
>> tcp6   0  0 :::31690:::*
>> LISTEN  25966/docker-proxy
>> tcp6   0  0 :::31211:::*
>> LISTEN  21379/docker-proxy
>> tcp6   0  0 :::31245:::*
>> LISTEN  19988/docker-proxy
>> tcp6   0  0 :::31121:::*
>> 

Mesos slave ID change after reboot

2018-01-10 Thread Srikanth Viswanathan
I am trying to understand under what cases the mesos slave ID changes in
response to reboot.  I noticed this note at
http://mesos.apache.org/documentation/latest/upgrades/#upgrading-from-1-3-x-to-1-4-x
:

Agent is now allowed to recover its agent ID post a host reboot. This
> prevents the unnecessary discarding of agent ID by prior Mesos versions.
> Notes about backwards compatibility:
>
>- In case the agent’s recovery runs into agent info mismatch which may
>happen due to resource change associated with reboot, it’ll fall back to
>recovering as a new agent (existing behavior).
>
>
>- In other cases such as checkpointed resources (e.g. persistent
>volumes) being incompatible with the agent’s resources the recovery will
>still fail (existing behavior).
>
>
I was wondering if the behavior prior to 1.3 is also similarly
well-defined. Is the answer "Will always change after a reboot"?

Thanks,
Srikanth