Re: Mesos slave not starting up

2013-08-11 Thread Vinod Kone
this a try. ** ** Nalini Sent from my iPad On Aug 6, 2013, at 11:52 PM, Vinod Kone vinodk...@gmail.com wrote: An executor terminated as soon as it's launched is indicative of slave being unable to fetch/launch the executor. ** ** In the case of hadoop framework, If your

Re: problem with mesos slaves and spark

2013-08-16 Thread Vinod Kone
Hey Franco, Mesos-0.9.0 is really old and no longer supported. The latest stable version is 0.12.1. You should give it a try! For the spark executor question, its probably best to ping spark's mailing list. Cheers, On Fri, Aug 16, 2013 at 8:16 AM, Franco Maria Nardini

Re: Build mesos with protobuf 2.5

2013-08-19 Thread Vinod Kone
What specific issues are you facing? Just curious. On Mon, Aug 19, 2013 at 3:35 PM, Li Jin ice.xell...@gmail.com wrote: For some compatibility issues I want to build mesos with protobuf 2.5. I am wondering how hard it's? Thanks, Li

Re: Example/doc on how to implement framework/scheduler

2013-08-21 Thread Vinod Kone
Hey Li, While your point about better documentation is duly noted, here are the answers to your specific questions. (1) Under TaskInfo, it says Either ExecutorInfo or CommandInfo should be set, that's the difference? The difference is as follows: If you set 'ExecutorInfo', the mesos slave

Re: Questions on implementing mesos framework

2013-08-22 Thread Vinod Kone
(1) One thing particular I found unexpected is that the executors are shutdown if the scheduler is shutdown. Is there a way to keep executors/tasks running when the scheduler is down? I would imagine when the scheduler comes back, it could reestablish the state somehow and keep going

Re: Messaging reliability in Mesos

2013-09-05 Thread Vinod Kone
tl:dr; If the master fails over when a slave fails, there is a (small) chance that status updates of that slave are not reliably sent to the scheduler. In the earlier versions (pre 0.14.0) of mesos, when the master fails over at the same time as a slave failure, pending status updates of that

Re: Design advice

2013-09-10 Thread Vinod Kone
Sam, Glad to see you are interested in building a framework on top of Mesos! From your description, it looks like your partitions can be directly mapped to Mesos executors and jobs to Mesos tasks. In Mesos, each executor is run under a cgroup with a given set of resources. This enables multiple

Re: Matching a single Offer with multiple Requests

2013-10-01 Thread Vinod Kone
On Tue, Oct 1, 2013 at 9:55 AM, Sam Taha taha...@gmail.com wrote: Sorry, just noticed that SchedulerDriver.launchTasks() takes a List of Tasks, so I guess you can launch multiple job/Task requests against the same OfferID if you make them all in the same launchTasks() call. You are right.

Re: Cannot parse '@0.0.0.0:0' on my first run of Mesos

2013-10-04 Thread Vinod Kone
Good find Abhishek! The website discrepancy is because the the command line arg to test-framework changed after 0.13.0 to --master=ip:port. We have plans to have documentation tagged with release version to avoid these issues in the future. On Fri, Oct 4, 2013 at 8:05 AM, Abhishek Parolkar

Re: Disk Resource Offer Control

2013-10-07 Thread Vinod Kone
Hey Phil. This was fixed in 0.12.1. I recommend upgrading to that or 0.13.0. @vinodkone On Mon, Oct 7, 2013 at 8:37 AM, Phil Siegrist psiegr...@gmail.com wrote: Hi Damien et al, This does not seem to exactly work: Let me explain. I'm on mesos 0.12.0 I've launched the slave with the

Re: Getting started link on mesos homepage is busted

2013-10-10 Thread Vinod Kone
Dave Lester fixed the link. Thanks for the report! On Thu, Oct 10, 2013 at 9:59 AM, Drew Csillag dr...@spotify.com wrote: On Thu 10 Oct 2013 12:49:47 PM EDT, Ross Allen wrote: Thanks for the report, Drew. The link works for me; it's href is http://mesos.apache.org/gettingstarted/; both

Re: Help with make check errors on mesos-0.13.0 on ubuntu-12.04

2013-10-15 Thread Vinod Kone
Can you run it in verbose mode and email the output? MESOS_VERBOSE=1 make check On Tue, Oct 15, 2013 at 1:41 PM, Tse, Philip philip@verizonwireless.com wrote: Hi, ** ** Newbie trying to get mesos-0.13.0 to build and installed. I go it to configure and make without

[VOTE][Result] Release Apache Mesos 0.14.0 (rc6)

2013-10-15 Thread Vinod Kone
Hi, I'm happy to announce the passing of 0.14.0 vote with 3 +1 binding votes, no -1 and no 0 votes: +1: Benjamin Hindman (binding) Benjamin Mahler (binding) Dave Lester (binding) -1: 0: Please find the release at http://www.apache.org/dist/mesos/0.14.0 (or preferably please use a mirror from

Re: [VOTE] Release Apache Mesos 0.14.1 (rc1)

2013-10-16 Thread Vinod Kone
+1 (binding) Tested on Ubuntu 12.04 64-bit. On Wed, Oct 16, 2013 at 2:11 PM, Benjamin Mahler benjamin.mah...@gmail.comwrote: Hi all, Please vote on releasing the following candidate as Apache Mesos version 0.14.1.

Re: Dynamically sizing a mesos cluster.

2013-10-17 Thread Vinod Kone
Absolutely. The adding and removing themselves must be done out of band though. @vinodkone Sent from my mobile On Oct 17, 2013, at 6:25 AM, Douglas Voet dv...@broadinstitute.org wrote: Hi, Is there any functionality in Mesos to dynamically resize a cluster? I want to be able to add

Re: Resource utilization stats

2013-10-17 Thread Vinod Kone
MESOS-581 is tracking cpu/mem usage of master/slave process themselves. I think what you are looking for is close to https://issues.apache.org/jira/browse/MESOS-62 On Thu, Oct 17, 2013 at 12:06 PM, Sam Taha taha...@gmail.com wrote: https://issues.apache.org/jira/browse/MESOS-581 On Thu,

Re: Check status of Framework connection

2013-10-18 Thread Vinod Kone
and I will start receiving offers? Or will I have to reconnect from my end by periodically checking if the master/zookeeper is back up again? Thanks, Sam Taha http://www.grandlogic.com On Fri, Oct 18, 2013 at 5:38 PM, Vinod Kone vinodk...@gmail.com wrote: You are right. disconnected

Re: process isolation

2013-10-21 Thread Vinod Kone
Yes. @vinodkone Sent from my mobile On Oct 21, 2013, at 5:05 AM, Paul Mackles pa...@loopr.com wrote: Hi - I just wanted to confirm my understanding of something... with process isolation, Mesos will not do anything if a given executor exceeds its resource allocation. In other words, if

Re: JobServer scheduling and processing jobs on Mesos

2013-10-21 Thread Vinod Kone
Sweet. Great to a see a new framework on Mesos! On Mon, Oct 21, 2013 at 2:36 PM, Sam Taha taha...@gmail.com wrote: Greetings All, Wanted to announce that JobServer, a job scheduling/processing/management application, is now integrated with Mesos. I put out a brief post on my blog:

Re: Ignoring registration!!

2013-10-28 Thread Vinod Kone
That is strange because the values look the same master@127.0.0.1:5050. Can you paste the steps to reproduce this? Also, what version of Mesos are you running? On Mon, Oct 28, 2013 at 1:47 PM, Mohamad Rezaei m...@pdc.kth.se wrote: I am getting this error now that I am trying to run the

Re: [VOTE] Release Apache Mesos 0.14.2 (rc2)

2013-11-06 Thread Vinod Kone
+1 make check on OSX 10.8.5 On Wed, Nov 6, 2013 at 12:45 PM, Benjamin Hindman benjamin.hind...@gmail.com wrote: +1 Thanks Ben! On Sun, Nov 3, 2013 at 5:12 PM, Benjamin Mahler benjamin.mah...@gmail.com wrote: Hi all, Please vote on releasing the following candidate as Apache Mesos

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
Hey Whitney, What version of mesos are you using (both in the cluster and the plugin)? The slave should print stuff to console when it is launching executor (e.g., Fetching resources...). I don't see that in the gist you pasted. Are you capturing stdout/stderr of the slave? On Thu, Nov 7, 2013

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
if the window for trying to connect to one of the mesos launched slaves is long enough to try before it is terminated due to failures. Interestingly, when I try to connect to one of the existing slaves I get a 403. -Whitney On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone vinodk...@gmail.com wrote: Hey

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
inbetween): https://gist.github.com/wsorenson/8bf64e44fd42da354fa0 On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone vinodk...@gmail.com wrote: What does mesos-slave.err say? On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson wsoren...@hubspot.comwrote: Hi Vinod, It's 0.14.0-rc4 in both

Re: Jenkins mesos plugin failing

2013-11-07 Thread Vinod Kone
the Slave/Connect permission Going to look into what this means. On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone vinodk...@gmail.com wrote: I looked at the code and it looks there are few places the executor might fail before it fetches the URI. Most of them have to do with incorrect permissions

Re: mesos in a docker container

2013-11-08 Thread Vinod Kone
Not sure I understand your question. From the last sentence it looks like you already know about the --ip flag that could be passed to the master. Are you looking for something else? On Fri, Nov 8, 2013 at 1:28 PM, Khalid Goudeaux khalid.goude...@imc-chicago.com wrote: Is it possible to

Re: Jenkins mesos plugin failing

2013-11-08 Thread Vinod Kone
/pass in the Jenkins plugin configuration screen or if it should be fetched another way. On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone vinodk...@gmail.comwrote: Great. Let us know once you figure it out. Maybe I can add a FAQ to the plugin's README to help others (or you can contribute too

Re: mesos in a docker container

2013-11-08 Thread Vinod Kone
is useful). Essentially the master is registering itself in Zookeeper but doesn't authoritatively know where it is. On Fri, Nov 8, 2013 at 3:36 PM, Vinod Kone vinodk...@gmail.com wrote: Not sure I understand your question. From the last sentence it looks like you already know about

Re: mesos in a docker container

2013-11-08 Thread Vinod Kone
ip/port to a private ip/port. On Fri, Nov 8, 2013 at 5:36 PM, Vinod Kone vinodk...@gmail.com wrote: I see. So does docker handle the routing if the master binds to a container private port but frameworks/slaves try to connect with its public ip? On Fri, Nov 8, 2013 at 2:25 PM, Khalid

Re: resource reservation through a long running service

2013-11-25 Thread Vinod Kone
Your understanding is correct. Currently, if a framework holds on to an offer then it might impact other frameworks. If you want to guarantee that each of your frameworks can receive certain amount of resources, you could reserve resources to frameworks (roles). See the --resources flag on the

Re: Running Hadoop MRv2 on Mesos

2013-12-06 Thread Vinod Kone
I am unaware of any ongoing efforts related to this though this request came up a few times before. Maybe we could do this in our next (whenever it happens to be) Mesos hackathon! On Fri, Dec 6, 2013 at 11:46 PM, Aaron Gottlieb aagottl...@valueclick.comwrote: Hi, I was wondering if there

Re: Mesos slave GC clarification

2013-12-26 Thread Vinod Kone
Hi Thomas, The GC in mesos slave works as follows: -- Whenever an executor terminates, its sandbox directory is scheduled for gc for --gc_delay seconds into the future by the slave. -- However the slave also periodically (--disk_watch_interval) monitors the disk utilization and expedites the gc

Re: Mesos slave GC clarification

2013-12-26 Thread Vinod Kone
to diagnose this further? Thanks! Tom On Thu, Dec 26, 2013 at 2:26 PM, Vinod Kone vinodk...@gmail.com wrote: Hi Thomas, The GC in mesos slave works as follows: -- Whenever an executor terminates, its sandbox directory is scheduled for gc for --gc_delay seconds into the future by the slave

Re: Porting an app

2013-12-27 Thread Vinod Kone
I can't really find an example that is an end-to-end use case. By that I mean, I would like to know how to put the scheduler and the executor in the correct places. Right now I have a single jar with can be run from the command line: java -jar target/collector.jar and that would take care of

Re: Mesos slave GC clarification

2013-12-27 Thread Vinod Kone
I'm still not sure what exactly is the issue here but we have had couple of gc related fixes included in 0.15.0-rc5. Are you willing to try that out? On Thu, Dec 26, 2013 at 10:56 AM, Thomas Petr tp...@hubspot.com wrote: Hi, We're running Mesos 0.14.0-rc4 on CentOS from the mesosphere

unsubscribe

2013-12-31 Thread Vinod Kone
@vinodkone

Re: What happens after registering framework

2014-01-17 Thread Vinod Kone
Yes that is correct, assuming there are slave(s) registered with the master. @vinodkone On Thu, Jan 16, 2014 at 11:05 PM, Sai Sagar jsaisa...@gmail.com wrote: Hi all, If a framework is able to register successfully, what happens in the next step? Will the master send resource offers

Re: Mesos logging configuration questions

2014-01-21 Thread Vinod Kone
If --log_dir is not specified nothing is written to disk. ➜ build git:(master) ✗ ./bin/mesos-master.sh --help ... ... --log_dir=VALUELocation to put log files (no default, nothing is written to disk unless specified;

Re: How Mesos limits resources used by the executors

2014-01-21 Thread Vinod Kone
:02 PM, Vinod Kone vi...@twitter.com wrote: Mesos uses cgroupshttps://www.kernel.org/doc/Documentation/cgroups/cgroups.txtto limit cpu and memory. It is indeed surprising that your executor in not OOMing when using more memory than requested. Can you tell us what the following values look

Re: How Mesos limits resources used by the executors

2014-01-22 Thread Vinod Kone
. On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone vi...@twitter.com wrote: The way you set task resources looks correct. Can you paste what the slave logs say regarding the task/executor, esp. the lines that are from the cgroups isolator? Also, what is the command line of the slave? @vinodkone

Re: How Mesos limits resources used by the executors (OSX)

2014-01-23 Thread Vinod Kone
Hey David. Mesos doesn't enforce resource limits when run on OSX. @vinodkone On Thu, Jan 23, 2014 at 11:57 AM, David Richardson pudnik...@gmail.comwrote: Hello, Mesos is supported on OSX. However, OSX doesn't have cgroups. How does Mesos enforce limit resources on executors in OSX?

Re: Please Help me about hadoop on Mesos

2014-01-27 Thread Vinod Kone
I have some questions about running hadoop on top of Mesos, please help me. 1. when a tasktracker is launched, if n cpu core are allocated to it, it can only launch n-1 map tasks. Could someone tell me why? And, if I want to run map-only job, what should I do to run n map tasks on a n cpu

Re: Re: Please Help me about hadoop on Mesos

2014-01-27 Thread Vinod Kone
On Mon, Jan 27, 2014 at 10:07 AM, HUO Jing huoj...@ihep.ac.cn wrote: So, at the very beginning, if all the resource are assigned to hadoop, and after that, there are always enough jobs in jobtracker, is that meanning that the other framework will never get resource? Is it fair to do so ?

[RESULT][VOTE] Release Apache Mesos 0.16.0 (rc5)

2014-02-06 Thread Vinod Kone
Hi all, The vote for Mesos 0.16.0 (rc5) has passed with the following votes. +1 (Binding) -- Niklas Nielsen Benjamin Hindman Benjamin Mahler There were no 0 or -1 votes. Please find the release at: https://dist.apache.org/repos/dist/release/mesos/0.16.0 It is

Fwd: [VOTE] Release Apache Mesos 0.16.0 (rc5)

2014-02-06 Thread Vinod Kone
, Feb 2, 2014 at 1:23 PM Subject: Re: [VOTE] Release Apache Mesos 0.16.0 (rc5) To: Vinod Kone vinodk...@gmail.com, user@mesos.apache.org user@mesos.apache.org, d...@mesos.apache.org +1, Tested on Ubuntu 13.10 GCC 4.8.1 and Mac OS X Mavericks GCC 4.8.1 On January 31, 2014 at 11:27:28 AM, Vinod Kone

Re: Using mesos with storm

2014-02-26 Thread Vinod Kone
On Wed, Feb 26, 2014 at 11:37 AM, Andrew Milkowski amgm2...@gmail.comwrote: I0226 14:30:12.982986 45829 slave.cpp:536] Successfully attached file

[RESULT] [VOTE] Release Apache Mesos 0.18.0 (rc6)

2014-04-09 Thread Vinod Kone
-940f-0cdd6148d66b' sh: line 0: cd: spark-0.9.tar.gz: Not a directory sh: ./sbin/spark-executor: No such file or directory -- Cheers, Tim Freedom, Features, Friends, First - Fedora https://fedoraproject.org/wiki/SIGs/bigdata -- *From: *Vinod Kone

Re: Marathon does not register with mesos

2014-04-13 Thread Vinod Kone
Hey Mukesh, Mind pasting the master and marathon logs? That would help us diagnose. Vinod On Sun, Apr 13, 2014 at 11:56 AM, Mukesh G muk...@gmail.com wrote: Using marathon 0.4.1 and mesos 0.18 on Centos 6.4 platform, I am able to successfully bring up mesos master, zookeeper and mesos

0.18.1

2014-04-14 Thread Vinod Kone
Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any other important fix that belongs to 0.18.* release but didn't make it into 0.18.0 please reply to this thread and I'll

Re: 0.18.1

2014-04-15 Thread Vinod Kone
On Mon, Apr 14, 2014 at 10:10 PM, Vinod Kone vi...@twitter.com wrote: Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any other important fix that belongs to 0.18

Re: Mesos slaves disconnecting because of Zookeeper?

2014-04-15 Thread Vinod Kone
mess 0.17.0 had a major refactor around interaction with ZooKeeper. So I would definitely recommend giving it a try and see if the problem persists. On Tue, Apr 15, 2014 at 11:59 AM, Ted Young tyo...@guidewire.com wrote: Anyone have any suggestions? I'm still seeing these problems and it's

Re: 0.18.1

2014-04-15 Thread Vinod Kone
, Vinod Kone vinodk...@gmail.com wrote: On Mon, Apr 14, 2014 at 10:10 PM, Vinod Kone vi...@twitter.com wrote: Looks like I missed cherry-picking the fix for https://issues.apache.org/jira/browse/MESOS-1045 into 0.18.0. So I would like to cut 0.18.1 with the cherry-pick. If there is any

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-17 Thread Vinod Kone
On Thu, Apr 17, 2014 at 2:56 PM, David Greenberg dsg123456...@gmail.comwrote: My follow-up question is this--is there a way to tell whether I'm outside of the timeout window? I'd like to have my framework check ZK and determine whether it's w/in the framework timeout or not, so that it can

Re: What happens if a scheduler registers with a framework ID that hasn't been used in 48 hours?

2014-04-21 Thread Vinod Kone
On Mon, Apr 21, 2014 at 3:10 PM, Sharma Podila spod...@netflix.com wrote: On a related note, what if framework scheduler is up while Mesos master goes down. Then, if Mesos master restarts after a time interval greater than framework failover timeout, what is the expected behavior? Would the

Re: [VOTE] Release Apache Mesos 0.18.1 (rc2)

2014-05-02 Thread Vinod Kone
+1 make check passes on OSX 10.9 w/ gcc-4.8 On Wed, Apr 30, 2014 at 11:18 PM, Niklas Nielsen n...@qni.dk wrote: Hi all, Please vote on releasing the following candidate as Apache Mesos 0.18.1. 0.18.1 includes the following:

Re: protecting mesos from fat fingers

2014-05-06 Thread Vinod Kone
On Tue, May 6, 2014 at 2:01 PM, David Greenberg dsg123456...@gmail.comwrote: We are actually working on solving #2, by adding mutual authentication between masters and slaves, and ensure that each group knows in advance what the valid masters/slaves are. This allows us to ensure that no

Re: Where did 0.18.1 go? Suggesting 0.18.2

2014-05-13 Thread Vinod Kone
+1 On Tue, May 13, 2014 at 10:54 AM, Benjamin Hindman b...@eecs.berkeley.eduwrote: +1! On Tue, May 13, 2014 at 9:51 AM, Niklas Nielsen n...@qni.dk wrote: Hey everyone, First and foremost, I apologize for the radio silence on my part with regards to the 0.18.1 release. We didn't

Re: How can I ask mesos cluster to reload configuration?

2014-05-13 Thread Vinod Kone
Hey Chengwei, Mesos doesn't allow online update of its configuration. The only exception, so far, has been the VLOG level. To update resources, you should roll the slave with new flags. On Sun, May 11, 2014 at 12:02 AM, Chengwei Yang chengwei.yang...@gmail.comwrote: Hi List, Generally I

Re: [VOTE] Release Apache Mesos 0.18.2 (rc1)

2014-05-16 Thread Vinod Kone
+1 make check passed. Cent OS 6 w/ gccc 4.8 On Wed, May 14, 2014 at 8:33 PM, Iven Hsu ive...@gmail.com wrote: +1 make check succeeded in Arch Linux + clang 3.4.1 2014-05-15 3:06 GMT+08:00 Niklas Nielsen n...@qni.dk: Hi all, Please vote on releasing the following candidate as Apache

Re: callback port

2014-05-19 Thread Vinod Kone
Probably. How are you setting the LIBPROCESS_PORT in Marathon? It has to be set via CommandInfo.Environment() of the task/executor for this to take effect. On Fri, May 16, 2014 at 9:41 AM, Scott Clasen sc...@heroku.com wrote: Aha, thanks! I am still having an issue. I am executing the process

Re: callback port

2014-05-19 Thread Vinod Kone
the var set. On Mon, May 19, 2014 at 10:19 AM, Vinod Kone vinodk...@gmail.com wrote: Probably. How are you setting the LIBPROCESS_PORT in Marathon? It has to be set via CommandInfo.Environment() of the task/executor for this to take effect. On Fri, May 16, 2014 at 9:41 AM, Scott Clasen sc

Re: Mesos / Libprocess ENETUNREACH

2014-05-21 Thread Vinod Kone
-mesos-user@incubator (this mailing list is deprecated) Tom, Both the framework (and slaves) and master need to be able to talk to each other. IOW, if one of the end points uses a private IP (presumably thats the case with framework behind a VPN) then it wouldn't work. If you want the

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
You can use --hostname to tell master to publish a different address in zk. @vinodkone Sent from my mobile On May 23, 2014, at 12:40 AM, Tomas Barton barton.to...@gmail.com wrote: Hi, is it possible to run a Mesos master behind NAT? With the --ip flag I can set IP address of an actual

Re: Mesos master behind NAT

2014-05-23 Thread Vinod Kone
directly IP address, right? On 23 May 2014 17:38, Vinod Kone vinodk...@gmail.com wrote: 0.18.0 https://issues.apache.org/jira/browse/MESOS-672 On Fri, May 23, 2014 at 8:11 AM, Tomas Barton barton.to...@gmail.comwrote: Hey Vinod, thanks! That's exactly what I was looking for. I haven't

Re: ExecutorDriver

2014-05-27 Thread Vinod Kone
On Fri, May 16, 2014 at 12:30 PM, Diptanu Choudhury dipta...@gmail.comwrote: Is the ExecutorDriver that one gets in a launchTask callback in a Mesos Executor singleton? I am currently caching the instance of the ExecutorDriver when a launchTask is called in an Akka Actor which monitors a

Re: How to kill stuck frameworks in mesos

2014-05-28 Thread Vinod Kone
On Tue, May 27, 2014 at 8:56 PM, Manivannan citizenm...@gmail.com wrote: *What is the default fail over timeout ? * The default failover timeout is 0s. You can confirm this by grepping master log for lines that look like Giving framework framework-id time to failover. I'm surprised that master

Re: Framework Starvation

2014-05-30 Thread Vinod Kone
Hey Claudiu, Mind posting some master logs with the simple setup that you described (3 shark cli instances)? That would help us better diagnose the problem. On Fri, May 30, 2014 at 1:59 AM, Claudiu Barbura claudiu.barb...@atigeo.com wrote: This is a critical issue for us as we have to shut

Re: Mesos with non clustered environment.

2014-05-30 Thread Vinod Kone
Hey Raymond, Glad to hear that you are interested in Mesos. Please see my answers inline. It specifically is talking about resource requirements at the framework level. What if some tasks in the one framework require a GPU and others do not ? The kind of resources that tasks from Beaker

Re: Mesos master behind NAT

2014-05-30 Thread Vinod Kone
process::schedule() @ 0x7fdb5f394b50 start_thread @ 0x7fdb5f0df0ed (unknown) I guess I have to use directly IP address, right? On 23 May 2014 17:38, Vinod Kone vinodk...@gmail.com wrote: 0.18.0 https://issues.apache.org/jira/browse/MESOS-672 On Fri

Re: SLAVE LOST messages

2014-06-03 Thread Vinod Kone
The framework should receive a slave lost message though it is not reliably retired by the master incase it doesn't make it to the framework (master failover, framework failover etc). On Tue, Jun 3, 2014 at 3:09 PM, Diptanu Choudhury dipta...@gmail.com wrote: Hi, When a mesos slave process

Re: Framework Starvation

2014-06-03 Thread Vinod Kone
offers and are able to run queries again (see attached log_after_starvation file). Let me know if you need the slave logs. Thank you! Claudiu From: Vinod Kone vinodk...@gmail.com Reply-To: user@mesos.apache.org user@mesos.apache.org Date: Friday, May 30, 2014 at 10:13 AM To: user

Re: Framework Starvation

2014-06-03 Thread Vinod Kone
Either should be fine. I don't think there are any changes in allocator since 0.18.0-rc1. On Tue, Jun 3, 2014 at 4:08 PM, Claudiu Barbura claudiu.barb...@atigeo.com wrote: Hi Vinod, Should we use the same 0-18.1-rc1 branch or trunk code? Thanks, Claudiu From: Vinod Kone vinodk

Re: Error while running Mesos slave on Mac OSX 10.9.3

2014-06-09 Thread Vinod Kone
as an identifier for the slave. Thanks! prakhar On Mon, Jun 9, 2014 at 1:56 PM, Vinod Kone vinodk...@gmail.com wrote: Looks like gethostbyname2 call is returning an error. I've seen this before on my mac when i have vpn software running (or incorrectly stopped). im surprised though that master

Re: Framework Starvation

2014-06-13 Thread Vinod Kone
In case you didn't receive my email from @twitter domain. On Thu, Jun 12, 2014 at 8:20 AM, Claudiu Barbura claudiu.barb...@atigeo.com wrote: We had to change the drf_sorter.cpp/hpp and hierarchical_allocator_process.cpp files. Hey Claudiu. Can you share the patch? @vinodkone

Re: Failed to perform recovery: Incompatible slave info detected

2014-06-18 Thread Vinod Kone
the case until cfs was enabled. On 18 June 2014 18:34, Vinod Kone vinodk...@gmail.com wrote: Hey Dick, Regarding slave recovery, any changes in the SlaveInfo (see mesos.proto) are considered as a new slave and hence recovery doesn't proceed forward. This is because Master caches SlaveInfo

Re: Failed to perform recovery: Incompatible slave info detected

2014-06-19 Thread Vinod Kone
to the metadata feature though - do you know why just the 'id' of the slaves isn't used? As it stands adding disk storage, cores or RAM to a slave will cause it to drop out of cluster - does checking the whole metadata provide any benefit vs. checking the id? On 18 June 2014 19:46, Vinod Kone vinodk

Re: Framework Starvation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 10:46 AM, Vinod Kone vi...@twitter.com wrote: Waiting to see your blog post :) That said, what baffles me is that in the very beginning when only two frameworks are present and no tasks have been launched, one framework is getting more allocations than other (see

Re: cgroups memory isolation

2014-06-19 Thread Vinod Kone
On Thu, Jun 19, 2014 at 11:33 AM, Sharma Podila spod...@netflix.com wrote: Yeah, having soft-limit for memory seems like the right thing to do immediately. The only problem left to solve being that it would be nicer to throttle I/O instead of OOM for high rate I/O jobs. Hopefully the soft

Re: HDFS on Mesos

2014-06-25 Thread Vinod Kone
Thanks for listing this out Adam. Data Residency: - Should we destroy the sandbox/hdfs-data when shutting down a DN? - If starting DN on node that was previously running a DN, can/should we try to revive the existing data? I think this is one of the key challenges for a production quality

Re: Framework unregistered

2014-06-27 Thread Vinod Kone
Perhaps we should call this out explicitly when we back port and do bug fix releases (0.18.0 and 0.19.0) and urge people to upgrade lest this gets drowned out in the noise. On Fri, Jun 27, 2014 at 11:40 AM, Benjamin Hindman benjamin.hind...@gmail.com wrote: Thanks for the bug report Whitney,

Re: cgroups OOM handler causing lockups?

2014-07-01 Thread Vinod Kone
Hey Whitney, I'll let Ian Downes comment on the specific patches you linked, but at a high level the bug in MESOS-662 was due to Mesos trying to handle OOM situations in user space instead of letting kernel handle it. We have since then changed the behavior to let Kernel handle the OOM. You can

Re: Task serialization per machine?

2014-07-01 Thread Vinod Kone
What Sharma said. Both the scheduler and executor drivers are single threaded i.e., you will only get one call back at a time. IOW, unless you return from one callback you won't get the next callback. On Tue, Jul 1, 2014 at 10:03 AM, Sharma Podila spod...@netflix.com wrote: Hi Asim, I am

Re: Running test-executor

2014-07-03 Thread Vinod Kone
Sammy, You need to run a framework to be able to run an executor. See http://mesos.apache.org/gettingstarted/ to see how to run the example python framework. On Thu, Jul 3, 2014 at 11:29 AM, Sammy Steele sammy_ste...@stanford.edu wrote: I am trying to figure out how to run the python

Re: 0.19.1

2014-07-03 Thread Vinod Kone
correct url: https://issues.apache.org/jira/issues/?jql=project%20%3D%20MESOS%20AND%20%22Target%20Version%2Fs%22%20%3D%200.19.1 On Thu, Jul 3, 2014 at 1:40 PM, Vinod Kone vinodk...@gmail.com wrote: Hi, We are planning to release 0.19.1 (likely next week) which will be a bug fix release

Re: Framework capable of launching multiple tasks on same offer?

2014-07-14 Thread Vinod Kone
Yes. You can definitely launch multiple tasks within the same offer (launchTasks() takes multiple TaskInfos) as long as the sum total of resources required by the tasks (and their executors) can fit in the offered resources. In fact, if you are hoarding offers (not recommended if you are running

Re: Framework capable of launching multiple tasks on same offer?

2014-07-14 Thread Vinod Kone
You can ignore that warning message. It was logged by mistake due to a regression. It's fixed on HEAD and will be included in 0.20.0. commit dd94a1fe9aff281f49d61bd8c214f41fcb340b04 Author: Vinod Kone vi...@twitter.com Date: Thu May 29 15:32:03 2014 -0700 Fixed a bug in scheduler driver

Re: Controlling Resources Allocated to a Given Task

2014-07-14 Thread Vinod Kone
How are you launching the slaves? By default the slave doesn't do any resource isolation. You should enable cgroups (only available on linux) for this to work. ./bin/mesos-slave.sh --isolation='cgroups/cpu,cgroups/mem' Note that 'cpu' isolation by default is a lower bound. To set it as an upper

Re: [VOTE] Release Apache Mesos 0.19.1 (rc1)

2014-07-14 Thread Vinod Kone
+1 (binding) Tested on OSX Mavericks w/ gcc-4.8 On Mon, Jul 14, 2014 at 2:35 PM, Timothy Chen tnac...@gmail.com wrote: +1 (non-binding). Tim On Mon, Jul 14, 2014 at 2:32 PM, Benjamin Mahler benjamin.mah...@gmail.com wrote: Hi all, Please vote on releasing the following candidate as

Re: spark and mesos issue

2014-07-16 Thread Vinod Kone
On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh gurvinder.si...@uninett.no wrote: ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling about it seems that mesos is starting slaves at the same time and giving

Re: spark and mesos issue

2014-07-16 Thread Vinod Kone
On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone vi...@twitter.com wrote: On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh gurvinder.si...@uninett.no wrote: ERROR storage.BlockManagerMasterActor: Got two different block manager registrations on 201407031041-1227224054-5050-24004-0 Googling

Disallowing completed frameworks from re-registering with the same framework id

2014-08-04 Thread Vinod Kone
Hi, Currently, there is a bug in Mesos, which allows a completed framework (e.g., removed by master due to being disconnected for longer than failover timeout) to re-register with the same framework id. This causes issues in the WebUI because the same framework id exists in active and terminated

Re: stale framework registrations

2014-08-05 Thread Vinod Kone
On Tue, Aug 5, 2014 at 4:58 PM, David Palaitis david.palai...@twosigma.com wrote: It’s still registered after a few hours… How did you stop marathon? Also, any log messages on the master pertaining to this event would be useful to diagnose. I don’t see a shutdown in the list of endpoints

Re: Exposing executor container

2014-08-12 Thread Vinod Kone
Hi Whitney, While we could conceivably set the container id in the environment of the executor, I would like to understand the problem you are facing. The fetching and extracting of the executor is done in by mesos-fetcher, a process forked by slave and run under slave's cgroup. AFAICT, this

Re: Exposing executor container

2014-08-12 Thread Vinod Kone
?). Thanks, Tom On Tue, Aug 12, 2014 at 1:09 PM, Vinod Kone vinodk...@gmail.com wrote: Hi Whitney, While we could conceivably set the container id in the environment of the executor, I would like to understand the problem you are facing. The fetching and extracting of the executor is done

Re: Exposing executor container

2014-08-13 Thread Vinod Kone
On Tue, Aug 12, 2014 at 1:17 PM, Thomas Petr tp...@hubspot.com wrote: That solution would likely cause us more pain -- we'd still need to figure out an appropriate amount of resources to request for artifact downloads / extractions, our scheduler would need to be sophisticated enough to only

Re: Slave disconnecting after I run the task

2014-08-15 Thread Vinod Kone
it is likely a networking issue. http://stackoverflow.com/questions/24559616/mesos-scheduler-slave-continuously-gets-disconnected On Thu, Aug 14, 2014 at 12:13 AM, Sai Sagar jsaisa...@gmail.com wrote: Hi all, When I am running an example in src/example, the slave is disconnecting from the

Re: Mesos + storm on top of Docker

2014-08-18 Thread Vinod Kone
Can you paste the slave/executor log related to the executor failure? @vinodkone On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum ya...@whatson-social.com wrote: Hi I have created a Docker based Mesos setup, including chronos, marathon, and storm. Following advice I saw previously on this

Re: Struggling with task controller Permissions on Hadoop Mesos

2014-08-18 Thread Vinod Kone
On Sat, Aug 16, 2014 at 4:26 AM, John Omernik j...@omernik.com wrote: I've confirmed on the package I am using that when I untar it using tar zxf as root, that the task-controller does NOT lose the setuid bit. But on the lost tasks in Mesos I get the error below. What's interesting is that

Re: URI of Executor is not recognized in mesos-0.18.1

2014-08-19 Thread Vinod Kone
what is the error? On Mon, Aug 18, 2014 at 11:54 PM, Sai Sagar jsaisa...@gmail.com wrote: Hi, I compiled my executor with the following command g++ executor.cpp -Lmesos-0.18.1/src/.libs/ -lmesos -I/usr/local/include -Imesos-0.18.1/src/

Re: error in make check

2014-08-19 Thread Vinod Kone
Is this repeatable? If yes, mind filing a ticket at https://issues.apache.org/jira/browse/MESOS? On Mon, Aug 18, 2014 at 11:47 PM, Giovanni Colapinto gcolapi...@innovazionedigitale.it wrote: Hello. I've compiled mesos from source. All fine with make, but make check gives me this error:

  1   2   3   4   5   6   >