Re: Trying to debug an issue in mesos task tracking

2015-01-21 Thread Sharma Podila
Have you checked the mesos-slave and mesos-master logs for that task id? There should be logs in there for task state updates, including FINISHED. There can be specific cases where sometimes the task status is not reliably sent to your scheduler (due to mesos-master restarts, leader election

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread David Greenberg
Is it possible to know the container_id prior when you submit the TaskInfo? If not, how can you find it out? On Wed, Jan 21, 2015 at 1:17 PM, Ian Downes idow...@twitter.com wrote: The final component is the container_id. Take a look in src/slave/paths.hpp to see the directory layout. On Wed,

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread Ian Downes
The final component is the container_id. Take a look in src/slave/paths.hpp to see the directory layout. On Wed, Jan 21, 2015 at 8:50 AM, David Greenberg dsg123456...@gmail.com wrote: So, I've looked into this more, and the UUID in runs doesn't appear appear to be the task-id, executor-id, or

Re: Storm on Mesos, Anyone Using?

2015-01-21 Thread Brenden Matthews
This is what I sent to Cory, but not to the mailing list: In my mind, most of the issues were with Storm itself, rather than Mesos. One annoying thing is that Nimbus is stateful (no HA), so you have to figure out a way to manage the assets on disk in a safe manner. We also used reserved

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread David Greenberg
So, I've looked into this more, and the UUID in runs doesn't appear appear to be the task-id, executor-id, or framework-id. do you have any idea what it could be? On Tue, Jan 13, 2015 at 5:21 PM, David Greenberg dsg123456...@gmail.com wrote: Thank you for your answers! On Tue, Jan 13, 2015 at

Re: Unable to follow Sandbox links from Mesos UI.

2015-01-21 Thread Ryan Thomas
Hey Dan, The UI will attempt to pull that info directly from the slave so you need to make sure the host is resolvable and routeable from your browser. Cheers, Ryan From my phone On Wednesday, 21 January 2015, Dan Dong dongda...@gmail.com wrote: Hi, All, When I try to access sandbox on

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread Ian Downes
No, the container id is generated by the slave when it launches the executor for a task (see Framework::launchExecutor() in src/slave/slave.cpp). However, the 'latest' symlink will point to the most recent container_id directory so you can likely just use that unless your framework is re-using

cluster wide init

2015-01-21 Thread CCAAT
Hello all, I was reading about Marathon: Marathon scheduler processes were started outside of Mesos using init, upstart, or a similar tool [1] So my related questions are Does Marathon work with mesos + Openrc as the init system? Are there any other frameworks that work with Mesos + Openrc?

Re: Accessing stdout/stderr of a task programmattically?

2015-01-21 Thread David Greenberg
It seems that if I take the URL that the Download button for stderr points to and curl it, I get the file. But, if I change the container_id to latest instead of the UUID, then I get a 404. Is there another way to resolve what the container_id is, since it seems critical to get files

Unable to follow Sandbox links from Mesos UI.

2015-01-21 Thread Dan Dong
Hi, All, When I try to access sandbox on mesos UI, I see the following info( The same error appears on every slave sandbox.): Failed to connect to slave '20150115-144719-3205108908-5050-4552-S0' on 'centos-2.local:5051'. Potential reasons: The slave's hostname, 'centos-2.local', is not

Re: Unable to follow Sandbox links from Mesos UI.

2015-01-21 Thread Cody Maloney
Also see https://issues.apache.org/jira/browse/MESOS-2129 if you want to track progress on changing this. Unfortunately it is on hold for me at the moment to fix. Cody On Wed, Jan 21, 2015 at 2:07 PM, Ryan Thomas r.n.tho...@gmail.com wrote: Hey Dan, The UI will attempt to pull that info

Re: Mesos 0.22.0

2015-01-21 Thread Adam Bordelon
Cosmin: 0.21.1-rc2 is actually the same as 0.21.1. Both are tagged to commit 2ae1ba91e64f92ec71d327e10e6ba9e8ad5477e8 On Wed, Jan 21, 2015 at 3:52 PM, Cosmin Lehene cleh...@adobe.com wrote: Also, the release page on github shows 0.21.1-rc2 as being after the 0.21.1 release...

Re: Marathon stability and use-case

2015-01-21 Thread Niklas Nielsen
Looping in Connor and Dario. On 21 January 2015 at 17:21, Benjamin Mahler benjamin.mah...@gmail.com wrote: Hm.. I'm not sure if any of the Marathon developers are on this list. They have a mailing list here: https://groups.google.com/forum/?hl=en#!forum/marathon-framework On Mon, Jan 19,

Re: cluster wide init

2015-01-21 Thread Shuai Lin
You can always write the init wrapper scripts for marathon. There is an official debian package, which you can find in mesos's apt repo. On Thu, Jan 22, 2015 at 4:20 AM, CCAAT cc...@tampabay.rr.com wrote: Hello all, I was reading about Marathon: Marathon scheduler processes were started

Re: Marathon stability and use-case

2015-01-21 Thread Dario Rexin
Thanks Niklas. Hi Antonin, Marathon should be able to handle tjousands of tasks and that is exactly what it's made for. Unfortunately the latest release (0.7.6) has been very unstable. We fixed a lot of bugs that caused this unstability and just tagged an RC for 0.8.0 yesterday:

Trying to debug an issue in mesos task tracking

2015-01-21 Thread Itamar Ostricher
I'm using a custom internal framework, loosely based on MesosSubmit. The phenomenon I'm seeing is something like this: 1. Task X is assigned to slave S. 2. I know this task should run for ~10minutes. 3. On the master dashboard, I see that task X is in the Running state for several *hours*. 4. I

Re: Architecture question

2015-01-21 Thread Adam Bordelon
You should also look into Chronos for workflow dependency management of batch jobs (also supports cron-like scheduling). On Fri, Jan 9, 2015 at 2:12 PM, Srinimurthy srinimur...@gmail.com wrote: Tim, This is a SAAS environment where the jobs running on each of these nodes are varying

Re: Architecture question

2015-01-21 Thread Tim St Clair
@some point I'd hope the litany of existing DAG generators that exist for legacy batch systems would make it's way to support this ecosystem. /me coughs Makeflow, pegasus ... | for that matter, one might redux a high throughput systems in a (Docker) world where NP-hard matching no longer