Question on integration of SGE (Sun Grid Engine) + Mesos

2013-08-08 Thread Simon Reavely
Hi, Has anyone integrated SGE with Mesos? If not, and tips on integrating a new scheduler with Mesos? Cheers, Simon -- Simon Reavely simon.reav...@gmail.com

Re: Question on integration of SGE (Sun Grid Engine) + Mesos

2013-08-08 Thread Paco Nathan
Would that be for Oracle Grid Eng or Open Grid Scheduler now ? On Aug 8, 2013, at 15:58, Kyle Ellrott kellr...@soe.ucsc.edu wrote: I'd like to second this. If anybody has done any work or has any ideas on how to get SGE (or even SLURM) under Mesos it would make my life much easier (and

Re: Mesos slave not starting up

2013-08-11 Thread Vinod Kone
Can you try our new instructions at https://github.com/mesos/hadoop ? On Sun, Aug 11, 2013 at 7:19 PM, Johnas, Nalini njoh...@ebay.com wrote: Hi Vinod, ** ** I tried everything suggested, still running into the same problem with TASK LOST and there is no executor logs created.

RE: Mesos slave not starting up

2013-08-12 Thread Johnas, Nalini
Thanks Vinod. Sure will do. May I ask what's different with this? -Nalini From: vi...@twitter.com [mailto:vi...@twitter.com] On Behalf Of Vinod Kone Sent: Sunday, August 11, 2013 8:35 PM To: user@mesos.apache.org Subject: Re: Mesos slave not starting up Can you try our new instructions at

problem with mesos slaves and spark

2013-08-16 Thread Franco Maria Nardini
Hi all, when I run a simple example on my new mesos/spark cluster I get this log on the slave nodes. I0816 16:49:02.749058 6429 slave.cpp:436] Got assigned task 0 for framework 201308161531542257298-5050-13853-0005 I0816 16:49:02.749480 6429 slave.cpp:1484] Generating a unique work directory

Re: problem with mesos slaves and spark

2013-08-16 Thread Vinod Kone
Hey Franco, Mesos-0.9.0 is really old and no longer supported. The latest stable version is 0.12.1. You should give it a try! For the spark executor question, its probably best to ping spark's mailing list. Cheers, On Fri, Aug 16, 2013 at 8:16 AM, Franco Maria Nardini

Re: Make install to specified dirs?

2013-08-19 Thread Benjamin Hindman
Hi Li, When you configure mesos you can set the directory prefix via --prefix=/path/to/prefix. Ben. On Mon, Aug 19, 2013 at 12:52 PM, Li Jin ice.xell...@gmail.com wrote: Hello guys, I am new to Mesos and trying to install it. By default make install Libraries under /usr/local/lib, how can

Build mesos with protobuf 2.5

2013-08-19 Thread Li Jin
For some compatibility issues I want to build mesos with protobuf 2.5. I am wondering how hard it's? Thanks, Li

Re: Build mesos with protobuf 2.5

2013-08-19 Thread Vinod Kone
What specific issues are you facing? Just curious. On Mon, Aug 19, 2013 at 3:35 PM, Li Jin ice.xell...@gmail.com wrote: For some compatibility issues I want to build mesos with protobuf 2.5. I am wondering how hard it's? Thanks, Li

Re: Build mesos with protobuf 2.5

2013-08-19 Thread Li Jin
I have some other dependencies that uses protobuf 2.5 On Mon, Aug 19, 2013 at 8:18 PM, Vinod Kone vinodk...@gmail.com wrote: What specific issues are you facing? Just curious. On Mon, Aug 19, 2013 at 3:35 PM, Li Jin ice.xell...@gmail.com wrote: For some compatibility issues I want to

Re: Example/doc on how to implement framework/scheduler

2013-08-21 Thread Vinod Kone
Hey Li, While your point about better documentation is duly noted, here are the answers to your specific questions. (1) Under TaskInfo, it says Either ExecutorInfo or CommandInfo should be set, that's the difference? The difference is as follows: If you set 'ExecutorInfo', the mesos slave

Re: Questions on implementing mesos framework

2013-08-22 Thread Vinod Kone
(1) One thing particular I found unexpected is that the executors are shutdown if the scheduler is shutdown. Is there a way to keep executors/tasks running when the scheduler is down? I would imagine when the scheduler comes back, it could reestablish the state somehow and keep going

Slave gets oom killed when using cgroups isolation?

2013-08-22 Thread Li Jin
Hello guys, I am implementing a mesos executor and see this behavior when I enabled cgroups isolation. It seems the slave got oom killed. I didn't expect the slave to be oom killed in any circumstance, am I wrong? Here are the slave log: I0822 21:22:09.168122 15557

Re: Slave gets oom killed when using cgroups isolation?

2013-08-22 Thread Benjamin Mahler
Hi Li, Why do you think the slave was OOM killed? Is there something that pointed you to that conclusion? All I see is the slave launched an executor, and the executor was killed by framework a few seconds after the task was launched. Also, what version are you running? Ben On Thu, Aug 22,

[PROPOSAL] Aurora for Apache Incubation

2013-08-26 Thread Dave Lester
Hi All, We're pleased to share a draft ASF incubation proposal for Aurora, a service scheduler used to schedule jobs onto Apache Mesos that we've developed at Twitter. Aurora provides all of the primitives necessary to quickly deploy and scale stateless and fault tolerant services in a

Re: Questions on task resource specs

2013-08-29 Thread Li Jin
Ben, Thanks for the reply. So my understanding is if some isolation module (in this case cgroups) needs certain resource values to be specified, I would need to specify those even if I am using a different isolation module, is that right? I am a bit worried that future releases might require

Re: Docker + Mesos

2013-08-30 Thread Florian Leibert
Mesosphere is currently working on Docker integration. We will announce it once it's completed. In the meantime, I encourage anyone to reach out to me directly who has questions about it. Thanks, --Flo On Wed, Aug 28, 2013 at 12:26 PM, Dave Lester d...@ischool.berkeley.eduwrote: There's an

Re: Frameworks with different priorities

2013-09-01 Thread Paco Nathan
But wait a moment, if there were that capability in core Mesos -- again, what would you define as the criteria for framework A begin considered free ? That use case / feedback is valuable, and the implementation may not be hard to do. On Sun, Sep 1, 2013 at 7:46 PM, 许立剑 xulij...@qiyi.com wrote:

Re: Question on resource sharing between frameworks

2013-09-03 Thread Benjamin Hindman
(1) I would think it would be pretty useful to be able to change the weights without restarting the master. Is it possible to do so? We currently do not store the weights persistently across masters, so the easiest way to do that right now is just restart the master(s) with the new weights.

Re: Question on resource sharing between frameworks

2013-09-03 Thread Li Jin
Ben, Some follow up questions/thoughts on --weights flag: (1) I would think it would be pretty useful to be able to change the weights without restarting the master. Is it possible to do so? (2) More generally, does mesos master have an admin interface? Thanks, Li On Tue, Sep 3, 2013 at 1:56

Re: Question on resource sharing between frameworks

2013-09-03 Thread Li Jin
Thanks Ben. This is great. Li On Tue, Sep 3, 2013 at 3:00 PM, Benjamin Hindman benjamin.hind...@gmail.com wrote: (1) I would think it would be pretty useful to be able to change the weights without restarting the master. Is it possible to do so? We currently do not store the weights

Mesos switch/rack awareness?

2013-09-03 Thread Manish Bhatt
[How] does Mesos handle/mitigate network switch saturation on hosts that are on the same rack?

Re: Mesos switch/rack awareness?

2013-09-03 Thread Florian Leibert
Mesos itself allows exposing rack-id switch-id, etc. as attributes which will be advertised as part of resource offers to all the frameworks. The Marathon framework will allow you to schedule multiple instances of the same app on the same rack/switch/node (clustered) or one instance of a given

Some more question on resource sharing

2013-09-04 Thread Li Jin
Hello Mesosers: Let me first make sure my understanding of how scheduling works (without whitelisting) is correct Basically: (1) Allocator gathers all available resources (2) Allocator picks a framework with the least share, offers all resources to that framework (3) Framework accepts/declines

Marathon project

2013-09-05 Thread Paco Nathan
Earlier this week Mesosphere launched Marathon, an open source framework based on Mesos for long-running services: https://github.com/mesosphere/marathon Marathon was built by the same team that did Chronos, with other contribs coming in now. Also, a couple articles just came out about Mesos,

Re: Messaging reliability in Mesos

2013-09-05 Thread Vinod Kone
tl:dr; If the master fails over when a slave fails, there is a (small) chance that status updates of that slave are not reliably sent to the scheduler. In the earlier versions (pre 0.14.0) of mesos, when the master fails over at the same time as a slave failure, pending status updates of that

Re: Design advice

2013-09-10 Thread Sam Taha
Yes that was my thinking for the slave/execution side of things. My partitions should map pretty well to slave executors/tasks and the cgroup/isolation capabilities of mesos will be great benefit. However, I still have some cases where users want to run each job (even jobs in the same partition

Re: Service Scheduling in Mesos

2013-09-18 Thread Bernerd
Bernerd, You should really out Marathon https://github.com/mesosphere/marathon This fits closely for what you've described ;) Oh, I am! :) Perhaps I can shorten, clarify, and generalize my underlying questions. Assume I have a single framework (say, marathon) and its tasks occupy all but

mesos-slave.sh reporting Failed to load unknown flag 'native_library'

2013-09-19 Thread Justin Becker
I've setup mesos 0.13.0 on mac osx 10.8.4 using Java 1.7.0_40 using the instructions from the README. Plus the hints from here, https://github.com/airbnb/chronos/blob/master/docs/FAQ.md The master starts up fine, Build: 2013-09-19 10:01:46 by jbeck I0919 10:17:03.554388 2107793792

Re: Service Scheduling in Mesos

2013-09-19 Thread Bernerd Schaefer
Thanks for the response, Bill. Some followups below. I haven't found a great way to approach either of these in mesos without assuming that your framework has full control of the cluster. This is covered a bit in the Omega paper [1]: *While a Mesos framework can use “filters” to describe

Re: Service Scheduling in Mesos

2013-09-20 Thread Paco Nathan
From what I understand, the Omega paper was written in 2012. It's great. Much has been added to Apache Mesos since. Particularly w.r.t. scheduling services. Also, the two-level categorization arguably has evolved further. On Thu, Sep 19, 2013 at 11:55 AM, Bernerd Schaefer

Re: is mesos-submit broken on HEAD (0.15) ?

2013-09-23 Thread Damien Hardy
Thank you Benjamin, I get 502 errors for now on https://reviews.apache.org /o\ 2013/9/20 Benjamin Mahler benjamin.mah...@gmail.com mesos-submit is indeed broken and in need of some love, David Greenberg has a review to fix it: https://reviews.apache.org/r/13367/ On Fri, Sep 20, 2013 at

Re: mesos-slave.sh reporting Failed to load unknown flag 'native_library'

2013-09-23 Thread Justin Becker
Ben, Thanks for the heads up. I commented out the export line in mesos-slave-flags.sh, the slave started up with no problems. Justin On Thu, Sep 19, 2013 at 3:10 PM, Benjamin Hindman benjamin.hind...@gmail.com wrote: Hey Justin, This looks like a bug. :( We cherry picked an incomplete

conf flag change from .12 to .13

2013-09-25 Thread Phil Siegrist
Hi all, I'm running into an issue with the deploy scripts in mesos 0.13.0 The mesos-daemon.sh script has the folliwing: nohup ${exec_prefix}/sbin/${PROGRAM} \ * --conf=${prefix}/var/mesos/conf* ${@} /dev/null /dev/null 21 However the mesos-master bin does not support the --conf flag and

Re: conf flag change from .12 to .13

2013-09-25 Thread Benjamin Mahler
Looks like 0.13.0 does not have the fix cherry-picked. 0.14.0 is incoming and will include the fix, until then you can cherry-pick the following commit, although it may depend on additional patches! commit 028d500c6d00023c8a56b37f885207adfd1e9a50 Author: Charles Reiss wog...@apache.org Date:

Matching a single Offer with multiple Requests

2013-10-01 Thread Sam Taha
Simple example scenario: If my Framework/Scheduler gets an Offer for say 2 cpu and 10G (from a single Slave/OfferID) and let's say I have two job requests that each need 1 cpu and 5G each. Now, can I make both requests against the same Offer (same OfferID) or can I only make one request even

Re: Matching a single Offer with multiple Requests

2013-10-01 Thread Sam Taha
Sorry, just noticed that SchedulerDriver.launchTasks() takes a List of Tasks, so I guess you can launch multiple job/Task requests against the same OfferID if you make them all in the same launchTasks() call. Is this an all or nothing batch call if one of them say requests more than what is

Re: Matching a single Offer with multiple Requests

2013-10-01 Thread Vinod Kone
On Tue, Oct 1, 2013 at 9:55 AM, Sam Taha taha...@gmail.com wrote: Sorry, just noticed that SchedulerDriver.launchTasks() takes a List of Tasks, so I guess you can launch multiple job/Task requests against the same OfferID if you make them all in the same launchTasks() call. You are right.

Re: Cannot parse '@0.0.0.0:0' on my first run of Mesos

2013-10-04 Thread Vinod Kone
Good find Abhishek! The website discrepancy is because the the command line arg to test-framework changed after 0.13.0 to --master=ip:port. We have plans to have documentation tagged with release version to avoid these issues in the future. On Fri, Oct 4, 2013 at 8:05 AM, Abhishek Parolkar

Ability to pass params to Task without TASK_LOST

2013-10-06 Thread Vladimir Vivien
All, I ran into a situation where I need to pass params/values to tasks as they are launched from the scheduler. The only mechanism that I see is Command.Environment variables. However, when I attempt to launch tasks with same environment variable names, the master rejects subsequent tasks with

Bindings for other languages

2013-10-06 Thread Harry Wilkinson
Hi, I'm wondering about adding bindings to create Mesos frameworks in other languages, based on the C++ and/or Java framework APIs. I've had a look around the code and I can see some headers to use, but I'm wondering whether there's some subset of the code required for linking to just enough for

Re: Bindings for other languages

2013-10-07 Thread Harry Wilkinson
Having thought about this a little more, since the code will need to be running on a server with Mesos running, I can probably treat this as a prerequisite, as long as the installed Mesos includes something that I can link to. Seems obvious in retrospect :o) On 6 October 2013 17:56, Harry

Re: Disk Resource Offer Control

2013-10-07 Thread Vinod Kone
Hey Phil. This was fixed in 0.12.1. I recommend upgrading to that or 0.13.0. @vinodkone On Mon, Oct 7, 2013 at 8:37 AM, Phil Siegrist psiegr...@gmail.com wrote: Hi Damien et al, This does not seem to exactly work: Let me explain. I'm on mesos 0.12.0 I've launched the slave with the

Re: Blog about mesos-0.13.0 Running on Ubuntu-12.04

2013-10-08 Thread prabeesh k
Thanks for your suggestions. bin/mesos-slave.sh couldn't work On Wed, Oct 9, 2013 at 10:48 AM, Vinod Kone vinodk...@gmail.com wrote: Looks good! Couple suggestions: -- 'make check' runs 'make' automatically. -- typo: For slaves: Instead of cd'ing into src you need to cd into bin On

Getting started link on mesos homepage is busted

2013-10-10 Thread Drew Csillag
Not sure where the best place to report this, but figured this isn't a terrible venue: The getting started link on http://mesos.apache.org, under the big green Download Mesos 0.13.0 is wrong. It is currently: http://mesos.apache.orggettingstarted/ and should be

Re: Getting started link on mesos homepage is busted

2013-10-10 Thread Drew Csillag
On Thu 10 Oct 2013 12:49:47 PM EDT, Ross Allen wrote: Thanks for the report, Drew. The link works for me; it's href is http://mesos.apache.org/gettingstarted/; both via HTTP and via HTTPS. What browser are you using, and is it still broken for you? -- Ross Allen On Thu, Oct 10, 2013 at

Re: Getting started link on mesos homepage is busted

2013-10-10 Thread Vinod Kone
Dave Lester fixed the link. Thanks for the report! On Thu, Oct 10, 2013 at 9:59 AM, Drew Csillag dr...@spotify.com wrote: On Thu 10 Oct 2013 12:49:47 PM EDT, Ross Allen wrote: Thanks for the report, Drew. The link works for me; it's href is http://mesos.apache.org/gettingstarted/; both

resource revocation and long-running task

2013-10-10 Thread Paul Mackles
Hi - I was re-reading the mesos technical paper. Particularly sections 3.3.1 and 4.3. I am currently running mesos-0.14.0.rc4 and I was wondering how much of what is discussed in those sections is actually implemented? Specifically, I don't see any way to allocate slots for long-running vs.

Re: Getting started link on mesos homepage is busted

2013-10-11 Thread Dave Lester
Hi All, Yes, thanks for reporting on this list and apologies for the error. I updated the site templates shortly after I saw your message yesterday. Dave On Thu, Oct 10, 2013 at 1:08 PM, Vinod Kone vinodk...@gmail.com wrote: Dave Lester fixed the link. Thanks for the report! On Thu, Oct

Github docs configuration link is broken

2013-10-12 Thread Viksit Gaur
All, The link on Github for Mesos command line flags that points to the configuration document from both https://github.com/apache/mesos/blob/master/docs/Home.md and http://mesos.apache.org/documentation/ to https://github.com/apache/mesos/blob/master/docs/Configuration.textile should

Re: Does libmesos.so depend on libjvm.so?

2013-10-13 Thread Ray Rodriguez
libjvm.so is not in the mesos binary but it is located in java installs. On our slaves it's located at /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libjvm.so Perhaps you need to install a full jdk on your dev machine? Ray On Sun, Oct 13, 2013 at 8:26 PM, Harry Wilkinson

Re: Does libmesos.so depend on libjvm.so?

2013-10-13 Thread Ray Rodriguez
And yes I do believe mesos is dependent on a locally installed java. On Sun, Oct 13, 2013 at 8:42 PM, Ray Rodriguez rayrod2...@gmail.com wrote: libjvm.so is not in the mesos binary but it is located in java installs. On our slaves it's located at

Re: It is make sense to use memo to manage Ruby/Sinatra app?

2013-10-14 Thread Florian Leibert
Hi Ryan I think you might be interested in looking at Marathon. Marathon allows you to keep n instances of an application running in a cluster and will ensure that if one of them dies, another one takes over. It also aims to solve service discovery along the way. If you're familiar with Play!

Re: Help with make check errors on mesos-0.13.0 on ubuntu-12.04

2013-10-15 Thread Vinod Kone
Can you run it in verbose mode and email the output? MESOS_VERBOSE=1 make check On Tue, Oct 15, 2013 at 1:41 PM, Tse, Philip philip@verizonwireless.com wrote: Hi, ** ** Newbie trying to get mesos-0.13.0 to build and installed. I go it to configure and make without

[VOTE][Result] Release Apache Mesos 0.14.0 (rc6)

2013-10-15 Thread Vinod Kone
Hi, I'm happy to announce the passing of 0.14.0 vote with 3 +1 binding votes, no -1 and no 0 votes: +1: Benjamin Hindman (binding) Benjamin Mahler (binding) Dave Lester (binding) -1: 0: Please find the release at http://www.apache.org/dist/mesos/0.14.0 (or preferably please use a mirror from

Re: [VOTE][Result] Release Apache Mesos 0.14.0 (rc6)

2013-10-16 Thread Benjamin Mahler
We have been running 0.14.0 on production clusters at Twitter and have found a small number of bugs. As such, I will be sending out a VOTE on a 0.14.1 release that will be based off of the 0.14.1-rc1 tag. The most important of these bugs are: MESOS-662: When running Mesos with cgroups memory

Re: [VOTE] Release Apache Mesos 0.14.1 (rc1)

2013-10-16 Thread Vinod Kone
+1 (binding) Tested on Ubuntu 12.04 64-bit. On Wed, Oct 16, 2013 at 2:11 PM, Benjamin Mahler benjamin.mah...@gmail.comwrote: Hi all, Please vote on releasing the following candidate as Apache Mesos version 0.14.1.

Dynamically sizing a mesos cluster.

2013-10-17 Thread Douglas Voet
Hi, Is there any functionality in Mesos to dynamically resize a cluster? I want to be able to add and remove slave nodes and maybe masters while the cluster is running. Thanks, Doug

Re: Dynamically sizing a mesos cluster.

2013-10-17 Thread Vinod Kone
Absolutely. The adding and removing themselves must be done out of band though. @vinodkone Sent from my mobile On Oct 17, 2013, at 6:25 AM, Douglas Voet dv...@broadinstitute.org wrote: Hi, Is there any functionality in Mesos to dynamically resize a cluster? I want to be able to add

Re: Dynamically sizing a mesos cluster.

2013-10-17 Thread Benjamin Hindman
We'll likely be working on some added support for host drain in the not too distant future too, see MESOS-544https://issues.apache.org/jira/browse/MESOS-544 . On Thu, Oct 17, 2013 at 8:40 AM, Vinod Kone vinodk...@gmail.com wrote: Absolutely. The adding and removing themselves must be done out

Re: Resource utilization stats

2013-10-17 Thread Vinod Kone
MESOS-581 is tracking cpu/mem usage of master/slave process themselves. I think what you are looking for is close to https://issues.apache.org/jira/browse/MESOS-62 On Thu, Oct 17, 2013 at 12:06 PM, Sam Taha taha...@gmail.com wrote: https://issues.apache.org/jira/browse/MESOS-581 On Thu,

Re: [VOTE] Release Apache Mesos 0.14.1 (rc1)

2013-10-17 Thread Dave Lester
+1 (binding) Successfully ran `make check` on OSX 10.8.5. On Thu, Oct 17, 2013 at 11:22 AM, Benjamin Hindman benjamin.hind...@gmail.com wrote: +1 On Wed, Oct 16, 2013 at 2:11 PM, Benjamin Mahler benjamin.mah...@gmail.comwrote: Hi all, Please vote on releasing the following

Re: Resource utilization stats

2013-10-18 Thread Sam Taha
Thanks Ben. I will see if I can incorporate this in my reporting GUI. Do you know how long lived these are stats are? Ideally, I am thinking it would be nice to return basic usage stats maybe via the TaskStatus so that I can capture them when they are returned when the TASK_FINISHED event is

Re: Check status of Framework connection

2013-10-18 Thread Sam Taha
Note, I am not using Zookeeper in my test environment. I am just connecting to a single master node. Obviously with zookeeper there would be master failover if the primary went down, but there could be situations when the entire master/zookeeper cluster are down, so the scenario still applies I

Re: Check status of Framework connection

2013-10-18 Thread Sam Taha
Thanks Vinod, that explains what I am seeing. On a related question. Let's say the master/zookeeper is down for a period of time and then restarted and my Framework is running during that time (and previously connected to mesos). Will the master try to reconnect with my Framework again on its own

Re: Check status of Framework connection

2013-10-18 Thread Vinod Kone
Once the zk and master are back up, the scheduler driver automatically discovers and re-registers with the master. The framework then would automatically start receiving offers. Framework writers don't have to worry about it! On Fri, Oct 18, 2013 at 2:54 PM, Sam Taha taha...@gmail.com wrote:

process isolation

2013-10-21 Thread Paul Mackles
Hi - I just wanted to confirm my understanding of something... with process isolation, Mesos will not do anything if a given executor exceeds its resource allocation. In other words, if I accept a resource with 1GB of memory and then my executor uses 3GB, Mesos won't detect that the process

Re: process isolation

2013-10-21 Thread Vinod Kone
Yes. @vinodkone Sent from my mobile On Oct 21, 2013, at 5:05 AM, Paul Mackles pa...@loopr.com wrote: Hi - I just wanted to confirm my understanding of something... with process isolation, Mesos will not do anything if a given executor exceeds its resource allocation. In other words, if

Re: process isolation

2013-10-21 Thread Sam Taha
See comments from Ben Mahler on related question about isolation and using cgroups with and without cpu subsystems and cfs enforced: If using process isolation nothing is enforced. If using cgroups isolation: with no subsystems: nothing is enforced. with the 'cpu' subsystem: this will

[VOTE][RESULT] Release Apache Mesos 0.14.1 (rc1)

2013-10-21 Thread Benjamin Mahler
Sorry I couldn't get to this over the weekend! The vote on 0.14.1 (rc1) has passed with 4 +1 binding votes, no -1 and no 0 votes: +1: Vinod Kone (binding) Benjamin Hindman (binding) Dave Lester (binding) Chris Mattmann (binding) -1: 0: Please find the release at

JobServer scheduling and processing jobs on Mesos

2013-10-21 Thread Sam Taha
Greetings All, Wanted to announce that JobServer, a job scheduling/processing/management application, is now integrated with Mesos. I put out a brief post on my blog: http://grandlogic.blogspot.com/2013/10/jobserver-with-mesos.html This initial integration with Mesos went a bit better than I

Re: JobServer scheduling and processing jobs on Mesos

2013-10-21 Thread Vinod Kone
Sweet. Great to a see a new framework on Mesos! On Mon, Oct 21, 2013 at 2:36 PM, Sam Taha taha...@gmail.com wrote: Greetings All, Wanted to announce that JobServer, a job scheduling/processing/management application, is now integrated with Mesos. I put out a brief post on my blog:

Powered by Mesos page

2013-10-22 Thread Sam Taha
Would it be possible to add Grand Logic and JobServer to the organizations and products using and built on Mesos? Thanks, Sam Taha http://www.grandlogic.com

Application Dependency Managaement ala YARN

2013-10-22 Thread Sam Taha
I am looking to implement a custom executor, but I do not want to require users to distribute the JARs and other related resources associated with my executor onto every slave node. I see from the code that I can addUris and even .tgz that get downloaded and extracted during the execution of the

Recent blog post about Slave Recovery

2013-10-23 Thread Dave Lester
I recently pushed changes to the Mesos website that added a blog. The initial post by Vinod Kone goes in depth about the Slave Recovery feature in Mesos 0.14.1. Give it a read! http://mesos.apache.org/blog/slave-recovery-in-apache-mesos/ Dave

Re: UI remote Task Sandbox displays error

2013-10-23 Thread Benjamin Mahler
There could be an issue related to using the ':' character in your executor id (which ultimately gets mapped to a path). Awhile back I filed this: https://issues.apache.org/jira/browse/MESOS-361 That's all I have to go on with the given information, the slave log might be more informative here,

Re: Ignoring registration!!

2013-10-28 Thread Vinod Kone
That is strange because the values look the same master@127.0.0.1:5050. Can you paste the steps to reproduce this? Also, what version of Mesos are you running? On Mon, Oct 28, 2013 at 1:47 PM, Mohamad Rezaei m...@pdc.kth.se wrote: I am getting this error now that I am trying to run the

Re: Ignoring registration!!

2013-10-30 Thread Mohamad Rezaei
found the problem. Thanks for the help. It was because I changed the directory, and it couldn't find the executor. Best Regards Mohamad Rezaei --- Researcher at PDC KTH Royal Institute of Technology On Mon, Oct 28, 2013 at 10:02 PM, Vinod Kone vinodk...@gmail.com wrote:

Re: Looking for volunteers to help improve Mesos documentation

2013-10-31 Thread Paco Nathan
I'm in. experience in writing getting started guides, a variety of markup workflows, general editorial work, reviews, etc. For that matter, if Apache would allow it, we could build our written collateral in Atlas and publish as a free O'Reilly EPUB. On Thu, Oct 31, 2013 at 5:45 PM, Dave Lester

Stateful Master

2013-10-31 Thread Benjamin Mahler
Hi All, I'd like to mention some changes that have been discussed amongst the committers but have not yet been shared broadly with the list. The central component of Mesos is the Master. The Master is responsible for administering slaves, frameworks, and resource offers. It also handles task

cdh3 and mesos 0.14.1

2013-10-31 Thread Darin Johnson
There used to be a patch for cdh3 in previous versions of mesos.  However, the 0.14.1 the docs point to a github repo which is configured for hadoop core 1.2.1 (or cdh4).  I attempted to set the version in the pom and followed the directions from there.  It compiled fine.  After modifying

Re: Looking for volunteers to help improve Mesos documentation

2013-10-31 Thread Dave Lester
Shingo and Ryosuke: Awesome RE: translation. Anyone interested in translating in languages other than Japanese? Paco, interesting question as it relates to republishing via O'Reilly EPUB. I'm not sure if that's possible, but I suspect that others on the user list may know. On Thu, Oct 31, 2013

Re: Looking for volunteers to help improve Mesos documentation

2013-10-31 Thread Sam Taha
What will be the process to submit changes? Will be the same process that is document for submitting code patches along with the same review process? I can help in the area of document get started writing a Framework. Thanks, Sam Taha http://www.grandlogic.com On Thu, Oct 31, 2013 at 9:26 PM,

Re: Looking for volunteers to help improve Mesos documentation

2013-10-31 Thread OMURA, Shingo
Shingo and Ryosuke: Awesome RE: translation. Thanks paco :-) I think that we have to be aware of the difficulty to keep japanese translation the latest. So, I think It would be better that japanese version would be released synchronized with mesos release. It could be the same for other

[VOTE] Release Apache Mesos 0.14.2 (rc2)

2013-11-03 Thread Benjamin Mahler
Hi all, Please vote on releasing the following candidate as Apache Mesos 0.14.2. To fix MESOS-662 in 0.14.1, the OOM killer was enabled and the cgroups isolator used the memory soft limit along with a threshold

Re: [VOTE] Release Apache Mesos 0.14.2 (rc2)

2013-11-06 Thread Benjamin Hindman
+1 Thanks Ben! On Sun, Nov 3, 2013 at 5:12 PM, Benjamin Mahler benjamin.mah...@gmail.comwrote: Hi all, Please vote on releasing the following candidate as Apache Mesos 0.14.2. To fix MESOS-662 in 0.14.1,

Re: [VOTE] Release Apache Mesos 0.14.2 (rc2)

2013-11-06 Thread Vinod Kone
+1 make check on OSX 10.8.5 On Wed, Nov 6, 2013 at 12:45 PM, Benjamin Hindman benjamin.hind...@gmail.com wrote: +1 Thanks Ben! On Sun, Nov 3, 2013 at 5:12 PM, Benjamin Mahler benjamin.mah...@gmail.com wrote: Hi all, Please vote on releasing the following candidate as Apache Mesos

Jenkins mesos plugin failing

2013-11-07 Thread Whitney Sorenson
Hi all! I am trying to get the Jenkins Mesos plugin functioning. I was able to get it installed on our Jenkins master. However, it's unclear if there are any required steps for setting up the slaves. When a framework task is launched, it fails instantly and there are no logs in the runs folder.

Re: Jenkins mesos plugin failing

2013-11-07 Thread Ray Rodriguez
Hi Whitney I would have a look at this github issue where I work through some of my jenkins mesos-plugin issues with Vinod. Might be some of the same issues you are seeing. https://github.com/jenkinsci/mesos-plugin/issues/2 Ray On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson

Re: Jenkins mesos plugin failing

2013-11-07 Thread Whitney Sorenson
Thanks Ray. I have very similar issue (empty executor directories) - but don't have any issues curling the slave.jar URI - and I don't have any existing JNLP process running. I don't have a jenkins user - is that the only setup you did on the slave? -Whitney On Thu, Nov 7, 2013 at 1:16 PM,

Re: Jenkins mesos plugin failing

2013-11-07 Thread Ray Rodriguez
The logs that really helped me sort out what was happening where the jenkins logs so you may want to check those first. Also when your slave is trying to run the jenkins job you should check to see if it's actually able to start the slave.jar java process. Looks something like this: sh -c java

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
Hey Whitney, What version of mesos are you using (both in the cluster and the plugin)? The slave should print stuff to console when it is launching executor (e.g., Fetching resources...). I don't see that in the gist you pasted. Are you capturing stdout/stderr of the slave? On Thu, Nov 7, 2013

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
What does mesos-slave.err say? On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson wsoren...@hubspot.comwrote: Hi Vinod, It's 0.14.0-rc4 in both. I believe we have logging working: -rw-r--r-- 1 root root 0 Oct 22 23:48 mesos-slave.out lrwxrwxrwx 1 root root63 Oct 22 23:48

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
I looked at the code and it looks there are few places the executor might fail before it fetches the URI. Most of them have to do with incorrect permissions. The code was written to have any errors reported either in slave log or console or executor logs (there might be a bug here if we are in

Re: Jenkins mesos plugin failing

2013-11-07 Thread Whitney Sorenson
I added the jenkins user on the slave - this was the missing piece. I'll add this to my PR for the readme. Got much further now; now I'm getting a 403 on the fetch: /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp: 403 Forbidden at

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
Great. Let us know once you figure it out. Maybe I can add a FAQ to the plugin's README to help others (or you can contribute too :)). On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson wsoren...@hubspot.comwrote: I added the jenkins user on the slave - this was the missing piece. I'll add

Re: Jenkins mesos plugin failing

2013-11-07 Thread Whitney Sorenson
Looks like we're using authentication on our slaves. So you either need to pass -jnlpCredentials user:pass on the command line, or change around the permissions in Jenkins to allow anonymous users to connect/run jobs. I'm not sure if it would make sense or not to add the user/pass in the

Re: Jenkins mesos plugin failing

2013-11-07 Thread Whitney Sorenson
I should also point out the scheduler didn't seem to survive a reboot of Jenkins - I had to delete the mesos cloud and reenter the parameters. On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson wsoren...@hubspot.comwrote: Looks like we're using authentication on our slaves. So you either need to

Re: Jenkins mesos plugin failing

2013-11-07 Thread Benjamin Mahler
We should fix that so that it reconnects with Mesos after a restart of Jenkins! Can you file an issue for this? On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson wsoren...@hubspot.comwrote: I should also point out the scheduler didn't seem to survive a reboot of Jenkins - I had to delete the

Re: Jenkins mesos plugin failing

2013-11-07 Thread Benjamin Mahler
From the master's perspective, the framework disconnected immediately after registering. You can bump up the logging on the jenkins scheduler by ensuring that GLOG_v=3 is in your environment when our plugin is initialized. On Thu, Nov 7, 2013 at 3:17 PM, Whitney Sorenson

  1   2   3   4   5   6   7   8   9   10   >