Can't start master properly (stale state issue?); help!

2015-08-13 Thread Paul Bell
Hi All, I hope someone can shed some light on this because I'm getting desperate! I try to start components zk, mesos-master, and marathon in that order. They are started via a program that SSHs to the sole host and does service xxx start. Everyone starts happily enough. But the Mesos UI shows

Re: Can't start master properly (stale state issue?); help!

2015-08-13 Thread Paul Bell
5050 to find whether there are multi master processes run at a same machine or not? On Thu, Aug 13, 2015 at 10:37 PM, Paul Bell arach...@gmail.com wrote: Hi All, I hope someone can shed some light on this because I'm getting desperate! I try to start components zk, mesos-master

Re: Custom flags to docker run

2015-08-12 Thread Paul Bell
Hi Stephen, Via Marathon I am deploying Docker containers across a Mesos cluster. The containers have unique Weave IP@s allowing inter-container communication. All things considered, getting to this point has been relatively straight-forward, and Weave has been one of the IJW components. I'd be

Re: Can't start master properly (stale state issue?); help!

2015-08-14 Thread Paul Bell
:34 -0700 Subject: Re: Can't start master properly (stale state issue?); help! From: ma...@mesosphere.io To: user@mesos.apache.org On Thu, Aug 13, 2015 at 11:53 AM, Paul Bell arach...@gmail.com wrote: Marco hasodent, This is just a quick note to say thank you for your replies

Fate of slave node after timeout

2015-11-13 Thread Paul Bell
Hi All, IIRC, after (max_slave_ping_timeouts * slave_ping_timeout) is exceeded without a response from a mesos-slave, the master will remove the slave. In the Mesos UI I can see slave state transition from 1 deactivated to 0. Can that slave never again be added into the cluster, i.e., what

Re: manage jobs log files in sandboxes

2015-11-06 Thread Paul Bell
-qualified path under /tmp/mesos/slaves, but this is readily available via "docker inspect". -Paul On Fri, Nov 6, 2015 at 7:17 AM, Paul Bell <arach...@gmail.com> wrote: > Hi Mauricio, > > YeahI see your point; thank you. > > My approach would be akin to closing th

Re: manage jobs log files in sandboxes

2015-11-06 Thread Paul Bell
Hi Mauricio, YeahI see your point; thank you. My approach would be akin to closing the barn door after the horse got out. Both Mesos & Docker are doing their own writing of STDOUT. Docker's rotation won't address Mesos's behavior. I need to find a solution here. -Paul On Thu, Nov 5, 2015

Use docker start rather than docker run?

2015-08-28 Thread Paul Bell
Hi All, I first posted this to the Marathon list, but someone suggested I try it here. I'm still not sure what component (mesos-master, mesos-slave, marathon) generates the docker run command that launches containers on a slave node. I suppose that it's the framework executor (Marathon) on the

Re: Use "docker start" rather than "docker run"?

2015-09-01 Thread Paul Bell
WIP): > https://github.com/massenz/zk-mesos/blob/develop/notebooks/HTTP%20API%20Tests.ipynb > > *Marco Massenzio* > > *Distributed Systems Engineerhttp://codetrips.com <http://codetrips.com>* > > On Fri, Aug 28, 2015 at 8:44 AM, Paul Bell <arach...@gmail.com> wrote: >

Re: Detecting slave crashes event

2015-09-16 Thread Paul Bell
w which agents, you > can also consume the logs as a stop-gap solution, until we offer a > mechanism for subscribing to cluster events. > > On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <arach...@gmail.com> wrote: > >> Hi All, >> >> I am led to believe that, unlike

Re: Changing mesos slave configuration

2015-09-23 Thread Paul Bell
Hi Pradeep, Perhaps I am speaking to a slightly different point, but when I change /etc/default/mesos-slave to add a new attribute, I have to remove file /tmp/mesos/meta/slaves/latest. IIRC, mesos-slave itself, in failing to start after such a change, tells me to do this: rm -f

Re: Use docker start rather than docker run?

2015-08-28 Thread Paul Bell
answer the docker start vs. docker run question? On Fri, Aug 28, 2015 at 1:26 PM, Paul Bell arach...@gmail.com wrote: Hi All, I first posted this to the Marathon list, but someone suggested I try it here. I'm still not sure what component (mesos-master, mesos-slave, marathon) generates

Re: Securing executors

2015-10-06 Thread Paul Bell
-executor and other mesos components. > > > On 05 Oct 2015, at 19:04, Paul Bell <arach...@gmail.com> wrote: > > Hi All, > > I am running an nmap port scan on a Mesos agent node and noticed nmap > reporting an open TCP port at 50577. > > Poking around some, I disc

Securing executors

2015-10-05 Thread Paul Bell
Hi All, I am running an nmap port scan on a Mesos agent node and noticed nmap reporting an open TCP port at 50577. Poking around some, I discovered exactly 5 mesos-docker-executor processes, one for each of my 5 Docker containers, and each with an open listen port: root 14131 3617 0 10:39

Re: Anyone try Weave in Mesos env ?

2015-11-26 Thread Paul Bell
Hi Weitao, I came up with this architecture as a way of distributing our application across multiple nodes. Pre-Mesos, our application, delivered as a single VMware VM, was not easily scalable. By breaking out the several application components as Docker containers, we are now able (within limits

Re: Anyone try Weave in Mesos env ?

2015-11-25 Thread Paul Bell
HmmI'm not sure there's really a "fix" for that (BTW: I assume you mean to fix high (or long) latency, i.e., to make it lower, faster). A network link is a network link, right? Like all hardware, it has its own physical characteristics which determine its latency's lower bound, below which it

Re: Help needed (alas, urgently)

2016-01-15 Thread Paul Bell
hing "wrong" in my kernel upgrade steps? Is anyone aware of such an issue in 3.19 or of work done post-3.13 in the area of task termination & signal handling? Thanks for your help. -Paul On Thu, Jan 14, 2016 at 5:14 PM, Paul Bell <arach...@gmail.com> wrote: > I spoke

Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
Hi All, It's been quite some time since I've posted here and that's chiefly because up until a day or two ago, things were working really well. I actually may have posted about this some time back. But then the problem seemed more intermittent. In summa, several "docker stops" don't work, i.e.,

Re: Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
(MB) res:77 virt:12942 2016-01-14T20:46:38.975+ [clientcursormon] mapped (incl journal view):12762 2016-01-14T20:46:38.975+ [clientcursormon] connections:0 Killing docker task Shutting down Killing docker task Shutting down Killing docker task Shutting down On Thu, Jan 14, 2016 at 3:38 PM,

Re: Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
and see if this still happens? > > Tim > > > > On Thu, Jan 14, 2016 at 11:52 AM, Paul Bell <arach...@gmail.com> wrote: > >> Hi All, >> >> It's been quite some time since I've posted here and that's chiefly >> because up until a day or two ago, thing

Re: Help needed (alas, urgently)

2016-01-14 Thread Paul Bell
I spoke to soon, I'm afraid. Next time I did the stop (with zero timeout), I see the same phenomenon: a mongo container showing repeated: killing docker task shutting down What else can I try? Thank you. On Thu, Jan 14, 2016 at 5:07 PM, Paul Bell <arach...@gmail.com> wrote: > Hi T

Recent UI enhancements & Managed Service Providers

2016-02-25 Thread Paul Bell
Hi All, I am running older versions of Mesos & Marathon (0.23.0 and 0.10.0). Over the course of the last several months I think I've seen several items on this list about UI enhancements. Perhaps they were enhancements to the data consumed by the Mesos & Marathon UIs. I've had very little time

Re: Recent UI enhancements & Managed Service Providers

2016-02-25 Thread Paul Bell
Hi Vinod, Thank you for your reply. I'm not sure that I can be more specific. MSPs are interested in a "view by tenant", e.g., "show me all applications that are allotted to Tenant X". I suppose that the standard Mesos UI could, with properly named task IDs and the UI's "Find" filter,

Feature request: move in-flight containers w/o stopping them

2016-02-18 Thread Paul Bell
Hello All, Has there ever been any consideration of the ability to move in-flight containers from one Mesos host node to another? I see this as analogous to VMware's "vMotion" facility wherein VMs can be moved from one ESXi host to another. I suppose something like this could be useful from a

Re: Agent won't start

2016-03-30 Thread Paul Bell
Greg, thanks again - I am planning on moving my work_dir. Pradeep, thanks again. In a slightly different scenario, namely, service mesos-slave stop edit /etc/default/mesos-slave (add a port resource) service mesos-slave start I noticed that slave did not start and - again - the log shows

Backup a Mesos Cluster

2016-04-11 Thread Paul Bell
Hi All, As we get closer to shipping a Mesos-based version of our product, we've turned our attention to "protecting" (supporting backup & recovery) of not only our application databases, but the cluster as well. I'm not quite sure how to begin thinking about this, but I suppose the usual

Re: Backup a Mesos Cluster

2016-04-11 Thread Paul Bell
ork, at least you should > implement kind of high availability for your scheduler (like > marathon/chronos does), or let it be launched by marathon so it can be > restarted when it fails. > > On Mon, Apr 11, 2016 at 7:27 PM, Paul Bell <arach...@gmail.com> wrote: > >>

Re: Mesos 0.28 SSL in official packages

2016-04-12 Thread Paul Bell
FWIW, I quite agree with Zameer's point. That said, I want to make abundantly clear that in my experience the folks at Mesosphere are wonderfully helpful. But what happens if down the road Mesosphere is acquired or there occurs some other event that could represent, if not a conflict of

Re: Agent won't start

2016-03-29 Thread Paul Bell
side of /tmp/ > via the `--work_dir` command-line flag. > > Cheers, > Greg > > > On Tue, Mar 29, 2016 at 2:08 PM, Paul Bell <arach...@gmail.com> wrote: > >> Hi, >> >> I am hoping someone can shed some light on this. >> >> An agent node f

Agent won't start

2016-03-29 Thread Paul Bell
Hi, I am hoping someone can shed some light on this. An agent node failed to start, that is, when I did "service mesos-slave start" the service came up briefly & then stopped. Before stopping it produced the log shown below. The last thing it wrote is "Trying to create path '/mesos' in

Re: Agent won't start

2016-03-29 Thread Paul Bell
interested in fully understanding the causal chain here before I try to fix anything. -Paul On Tue, Mar 29, 2016 at 5:51 PM, Paul Bell <arach...@gmail.com> wrote: > Whoa...interessant! > > The node *may* have been rebooted. Uptime says 2 days. I'll need to check > my notes. &g

Re: Agent won't start

2016-03-29 Thread Paul Bell
ior is for /tmp to > be completely nuked at boot time. Was the agent node rebooted prior to this > problem? > > On Tue, Mar 29, 2016 at 2:29 PM, Paul Bell <arach...@gmail.com> wrote: > >> Hi Greg, >> >> Thanks very much for your quick reply. >> >> I

Re: Mesos Master and Slave on same server?

2016-04-13 Thread Paul Bell
Hi June, In addition to doing what Pradeep suggests, I also now & then run a single node "cluster" that houses mesos-master, mesos-slave, and Marathon. Works fine. Cordially, Paul On Wed, Apr 13, 2016 at 12:36 PM, Pradeep Chhetri < pradeep.chhetr...@gmail.com> wrote: > I would suggest you to

Consequences of health-check timeouts?

2016-05-17 Thread Paul Bell
Hi All, I probably have the following account partly wrong, but let me present it just the same and those who know better can correct me as needed. I've an application that runs several MongoDB shards, each a Dockerized container, each on a distinct node (VM); in fact, some of the VMs are on

Re: Consequences of health-check timeouts?

2016-05-18 Thread Paul Bell
calamitous > occurrence? > >mesos-slaves get shutdown > Do you know where your mesos-master stuck when it happens? Any error log > or related log about this? In addition, is there any log when mesos-slave > shut down? > > On Wed, May 18, 2016 at 6:12 AM, Paul Bell <arac

Status of Mesos-3821

2016-04-19 Thread Paul Bell
Hi, I think I encountered the problem described by https://issues.apache.org/jira/browse/MESOS-3821 and wanted to ask if this fix is in Mesos 0.28. But perhaps I misunderstand what's being said; so by way of background our case is Mesos on CentOS 7.2. When we try to set --docker_socket to

Mesos loses track of Docker containers

2016-08-10 Thread Paul Bell
Hello, One of our customers has twice encountered a problem wherein Mesos & Marathon appear to lose track of the application containers that they started. Platform & version info: Ubuntu 14.04 (running under VMware) Mesos (master & agent): 0.23.0 ZK: 3.4.5--1 Marathon: 0.10.0 The phenomena:

Re: Mesos loses track of Docker containers

2016-08-10 Thread Paul Bell
ing problems with the docker > containerizer if memory serves. Also what version of docker? > > > On Wednesday, August 10, 2016, Paul Bell <arach...@gmail.com> wrote: > >> Hello, >> >> One of our customers has twice encountered a problem wherein Mesos & >>

unsubscribe

2017-01-16 Thread Paul Bell