from:"Timothy Chen"

Re: Mesos Python Daemon Launch

2017-07-20 Thread Timothy Chen

Are you using Docker containerizer or?

Tim

On Thu, Jul 20, 2017 at 10:50 PM, Chawla,Sumit  wrote:
> Any clue on this one?
>
> The python daemon is getting launched in different session and process
> group.  Not sure why its getting killed when the mesos slave is terminating
> the framework.
>
> Regards
> Sumit Chawla
>
>
> On Wed, Jul 19, 2017 at 4:24 PM, Chawla,Sumit 
> wrote:
>
>> I am using Mesos 0.27.  I am launching a Python Daemon from spark task.
>> Idea is that this Daemon should keep running even when the mesos framework
>> shuts dowm. However, I am facing issues in keeping this Python Daeamon
>> process alive. The process is getting killed as soon as Mesos framework is
>> dying.
>>
>>
>>
>> Regards
>> Sumit Chawla
>>
>>

Re: Welcome Gilbert Song as a new committer and PMC member!

2017-05-24 Thread Timothy Chen

Congrats! Rocking the containerizer world!

Tim

On Wed, May 24, 2017 at 11:23 AM, Zhitao Li  wrote:
> Congrats Gilbert!
>
> On Wed, May 24, 2017 at 11:08 AM, Yan Xu  wrote:
>
>> Congrats! Well deserved!
>>
>> ---
>> Jiang Yan Xu  | @xujyan 
>>
>> On Wed, May 24, 2017 at 10:54 AM, Vinod Kone  wrote:
>>
>>> Congrats Gilbert!
>>>
>>> On Wed, May 24, 2017 at 1:32 PM, Neil Conway 
>>> wrote:
>>>
>>> > Congratulations Gilbert! Well-deserved!
>>> >
>>> > Neil
>>> >
>>> > On Wed, May 24, 2017 at 10:32 AM, Jie Yu  wrote:
>>> > > Hi folks,
>>> > >
>>> > > I' happy to announce that the PMC has voted Gilbert Song as a new
>>> > committer
>>> > > and member of PMC for the Apache Mesos project. Please join me to
>>> > > congratulate him!
>>> > >
>>> > > Gilbert has been working on Mesos project for 1.5 years now. His main
>>> > > contribution is his work on unified containerizer, nested container
>>> (aka
>>> > > Pod) support. He also helped a lot of folks in the community regarding
>>> > their
>>> > > patches, questions and etc. He also played an important role
>>> organizing
>>> > > MesosCon Asia last year and this year!
>>> > >
>>> > > His formal committer checklist can be found here:
>>> > > https://docs.google.com/document/d/1iSiqmtdX_0CU-YgpViA6r6PU_
>>> > aMCVuxuNUZ458FR7Qw/edit?usp=sharing
>>> > >
>>> > > Welcome, Gilbert!
>>> > >
>>> > > - Jie
>>> >
>>>
>>
>>
>
>
> --
> Cheers,
>
> Zhitao Li

Re: Welcome Kevin Klues as a Mesos Committer and PMC member!

2017-03-01 Thread Timothy Chen

Congrats Kevin!

Tim

On Wed, Mar 1, 2017 at 3:20 PM, Neil Conway  wrote:
> Congratulations Kevin! Very well-deserved.
>
> Neil
>
> On Wed, Mar 1, 2017 at 2:05 PM, Benjamin Mahler  wrote:
>> Hi all,
>>
>> Please welcome Kevin Klues as the newest committer and PMC member of the
>> Apache Mesos project.
>>
>> Kevin has been an active contributor in the project for over a year, and in
>> this time he made a number of contributions to the project: Nvidia GPU
>> support [1], the containerization side of POD support (new container init
>> process), and support for "attach" and "exec" of commands within running
>> containers [2].
>>
>> Also, Kevin took on an effort with Haris Choudhary to revive the CLI [3]
>> via a better structured python implementation (to be more accessible to
>> contributors) and a more extensible architecture to better support adding
>> new or custom subcommands. The work also adds a unit test framework for the
>> CLI functionality (we had no tests previously!). I think it's great that
>> Kevin took on this much needed improvement with Haris, and I'm very much
>> looking forward to seeing this land in the project.
>>
>> Here is his committer eligibility document for perusal:
>> https://docs.google.com/document/d/1mlO1yyLCoCSd85XeDKIxTYyboK_uiOJ4Uwr6ruKTlFM/edit
>>
>> Thanks!
>> Ben
>>
>> [1] http://mesos.apache.org/documentation/latest/gpu-support/
>> [2]
>> https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU
>> [3]
>> https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVscgYqrD07OyIglsA/

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Timothy Chen

Dynamic allocation works with Coarse grain mode only, we wasn't aware
a need for Fine grain mode after we enabled dynamic allocation support
on the coarse grain mode.

What's the reason you're running fine grain mode instead of coarse
grain + dynamic allocation?

Tim

On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane
 wrote:
> We will be interested by the results if you give a try to Dynamic allocation
> with mesos !
>
>
> - Mail Original -
> De: "Michael Gummelt" 
> À: "Sumit Chawla" 
> Cc: user@mesos.apache.org, d...@mesos.apache.org, "User"
> , d...@spark.apache.org
> Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam / Berlin /
> Berne / Rome / Stockholm / Vienne
> Objet: Re: Mesos Spark Fine Grained Execution - CPU count
>
>
>> Is this problem of idle executors sticking around solved in Dynamic
>> Resource Allocation?  Is there some timeout after which Idle executors can
>> just shutdown and cleanup its resources.
>
> Yes, that's exactly what dynamic allocation does.  But again I have no idea
> what the state of dynamic allocation + mesos is.
>
> On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit 
> wrote:
>>
>> Great.  Makes much better sense now.  What will be reason to have
>> spark.mesos.mesosExecutor.cores more than 1, as this number doesn't include
>> the number of cores for tasks.
>>
>> So in my case it seems like 30 CPUs are allocated to executors.  And there
>> are 48 tasks so 48 + 30 =  78 CPUs.  And i am noticing this gap of 30 is
>> maintained till the last task exits.  This explains the gap.   Thanks
>> everyone.  I am still not sure how this number 30 is calculated.  ( Is it
>> dynamic based on current resources, or is it some configuration.  I have 32
>> nodes in my cluster).
>>
>> Is this problem of idle executors sticking around solved in Dynamic
>> Resource Allocation?  Is there some timeout after which Idle executors can
>> just shutdown and cleanup its resources.
>>
>>
>> Regards
>> Sumit Chawla
>>
>>
>> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt 
>> wrote:
>>>
>>> >  I should preassume that No of executors should be less than number of
>>> > tasks.
>>>
>>> No.  Each executor runs 0 or more tasks.
>>>
>>> Each executor consumes 1 CPU, and each task running on that executor
>>> consumes another CPU.  You can customize this via
>>> spark.mesos.mesosExecutor.cores
>>> (https://github.com/apache/spark/blob/v1.6.3/docs/running-on-mesos.md) and
>>> spark.task.cpus
>>> (https://github.com/apache/spark/blob/v1.6.3/docs/configuration.md)
>>>
>>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit 
>>> wrote:
>>>>
>>>> Ah thanks. looks like i skipped reading this "Neither will executors
>>>> terminate when they’re idle."
>>>>
>>>> So in my job scenario,  I should preassume that No of executors should
>>>> be less than number of tasks. Ideally one executor should execute 1 or more
>>>> tasks.  But i am observing something strange instead.  I start my job with
>>>> 48 partitions for a spark job. In mesos ui i see that number of tasks is 
>>>> 48,
>>>> but no. of CPUs is 78 which is way more than 48.  Here i am assuming that 1
>>>> CPU is 1 executor.   I am not specifying any configuration to set number of
>>>> cores per executor.
>>>>
>>>> Regards
>>>> Sumit Chawla
>>>>
>>>>
>>>> On Mon, Dec 19, 2016 at 11:35 AM, Joris Van Remoortere
>>>>  wrote:
>>>>>
>>>>> That makes sense. From the documentation it looks like the executors
>>>>> are not supposed to terminate:
>>>>>
>>>>> http://spark.apache.org/docs/latest/running-on-mesos.html#fine-grained-deprecated
>>>>>>
>>>>>> Note that while Spark tasks in fine-grained will relinquish cores as
>>>>>> they terminate, they will not relinquish memory, as the JVM does not give
>>>>>> memory back to the Operating System. Neither will executors terminate 
>>>>>> when
>>>>>> they’re idle.
>>>>>
>>>>>
>>>>> I suppose your task to executor CPU ratio is low enough that it looks
>>>>> like most of the resources are not being reclaimed. If your tasks were 
>>>>> using
>>>>> significantly more CPU the amortized cost of the idle execu

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-19 Thread Timothy Chen

Hi Chawla,

One possible reason is that Mesos fine grain mode also takes up cores
to run the executor per host, so if you have 20 agents running Fine
grained executor it will take up 20 cores while it's still running.

Tim

On Fri, Dec 16, 2016 at 8:41 AM, Chawla,Sumit  wrote:
> Hi
>
> I am using Spark 1.6. I have one query about Fine Grained model in Spark.
> I have a simple Spark application which transforms A -> B.  Its a single
> stage application.  To begin the program, It starts with 48 partitions.
> When the program starts running, in mesos UI it shows 48 tasks and 48 CPUs
> allocated to job.  Now as the tasks get done, the number of active tasks
> number starts decreasing.  How ever, the number of CPUs does not decrease
> propotionally.  When the job was about to finish, there was a single
> remaininig task, however CPU count was still 20.
>
> My questions, is why there is no one to one mapping between tasks and cpus
> in Fine grained?  How can these CPUs be released when the job is done, so
> that other jobs can start.
>
>
> Regards
> Sumit Chawla

Re: [Proposal] Remove the default value for agent work_dir

2016-04-12 Thread Timothy Chen

+1

Tim

On Wed, Apr 13, 2016 at 5:31 AM, Jie Yu  wrote:
> +1
>
> On Tue, Apr 12, 2016 at 9:29 PM, James Peach  wrote:
>
>>
>> > On Apr 12, 2016, at 3:58 PM, Greg Mann  wrote:
>> >
>> > Hey folks!
>> > A number of situations have arisen in which the default value of the
>> Mesos agent `--work_dir` flag (/tmp/mesos) has caused problems on systems
>> in which the automatic cleanup of '/tmp' deletes agent metadata. To resolve
>> this, we would like to eliminate the default value of the agent
>> `--work_dir` flag. You can find the relevant JIRA here.
>> >
>> > We considered simply changing the default value to a more appropriate
>> location, but decided against this because the expected filesystem
>> structure varies from platform to platform, and because it isn't guaranteed
>> that the Mesos agent would have access to the default path on a particular
>> platform.
>> >
>> > Eliminating the default `--work_dir` value means that the agent would
>> exit immediately if the flag is not provided, whereas currently it launches
>> successfully in this case. This will break existing infrastructure which
>> relies on launching the Mesos agent without specifying the work directory.
>> I believe this is an acceptable change because '/tmp/mesos' is not a
>> suitable location for the agent work directory except for short-term local
>> testing, and any production scenario that is currently using this location
>> should be altered immediately.
>>
>> +1 from me too. Defaulting to /tmp just helps people shoot themselves in
>> the foot.
>>
>> J

Re: Apache Spark Over Mesos

2016-03-15 Thread Timothy Chen

You can launch the driver and executor in docker containers as well by setting 
spark.mesos.executor.docker.image to the image you want to use to launch them.

Tim

> On Mar 15, 2016, at 8:49 AM, Radoslaw Gruchalski  wrote:
> 
> Pradeep,
> 
> You can mount a spark directory as a volume. This means you have to have 
> spark deployed on every agent.
> 
> Another thing you can do, place spark in hdfs, assuming that you have hdfs 
> available but that too will download a copy to the sandbox.
> 
> I'd prefer the former.
> 
> Sent from Outlook Mobile
> 
> _
> From: Pradeep Chhetri 
> Sent: Tuesday, March 15, 2016 4:41 pm
> Subject: Apache Spark Over Mesos
> To: 
> 
> 
> Hello,
> 
> I am able to run Apache Spark over Mesos. Its quite simple to run Spark 
> Dispatcher over marathon and ask it to run Spark Executor (I guess also can 
> be called as Spark Driver) as docker container.
> 
> I have a query regarding this:
> 
> All spark tasks are spawned directly by first downloading the spark 
> artifacts. I was thinking if there is some way I can start them too as docker 
> containers. This will save the time for downloading the spark artifacts. I am 
> running spark in fine-grained mode.
> 
> I have attached a screenshot of a sample job
> 
>  
>  
> Thanks,
> 
> -- 
> Pradeep Chhetri
> 
>

Re: [VOTE] Release Apache Mesos 0.28.0 (rc1)

2016-03-09 Thread Timothy Chen

Also like to include MESOS-4370 as it fixes IP Address look up logic
and also unblocks users using custom Docker network.

Tim

On Wed, Mar 9, 2016 at 9:55 AM, Gilbert Song  wrote:
> Hi Kevin,
>
> Please remove the the patch below from the list:
> Implemented runtime isolator default cmd test (still under review).
> https://reviews.apache.org/r/44469/
>
> Because the bug was fixed by patch #44468, the test should not be
> considered as a block. I am updating MESOS-4888 and move the test to a
> separate JIRA.
>
> Thanks,
> Gilbert
>
> On Tue, Mar 8, 2016 at 2:43 PM, Kevin Klues  wrote:
>
>> Here are the list of reviews/patches that have been called out in this
>> thread for inclusion in 0.28.0-rc2.  Some of them are still under
>> review and will need to land by Thursday to be included.
>>
>> Are there others?
>>
>> Jie's container image documentation (submitted):
>> commit 7de8cdd4d8ed1d222fa03ea0d8fa6740c4a9f84b
>> https://reviews.apache.org/r/44414
>>
>> Restore Mesos' ability to extract Docker assigned IPs (still under review):
>> https://reviews.apache.org/r/43093/
>>
>> Fixed the logic for default docker cmd case (submitted).
>> commit e42f740ccb655c0478a3002c0b6fa90c1144f41c
>> https://reviews.apache.org/r/44468/
>>
>> Implemented runtime isolator default cmd test (still under review).
>> https://reviews.apache.org/r/44469/
>>
>> Fixed a bug that causes the task stuck in staging state (still under
>> review).
>> https://reviews.apache.org/r/44435/
>>
>> On Tue, Mar 8, 2016 at 10:30 AM, Kevin Klues  wrote:
>> > Yes, will do.
>> >
>> > On Tue, Mar 8, 2016 at 10:26 AM, Vinod Kone 
>> wrote:
>> >> +kevin klues
>> >>
>> >> OK. I'm cancelling this vote since there are some show stopper issues
>> that
>> >> we need to cherry-pick. I'll cut another RC on Thursday.
>> >>
>> >> @shepherds: can you please make sure the blocker tickets are marked with
>> >> fix version and that they land today or tomorrow?
>> >>
>> >> @kevin: since you have volunteered to help with the release, can you
>> make
>> >> sure we have a list of commits to cherry pick for rc2?
>> >>
>> >> Thanks,
>> >>
>> >>
>> >> On Tue, Mar 8, 2016 at 12:05 AM, Shuai Lin 
>> wrote:
>> >>
>> >>> Maybe also https://issues.apache.org/jira/browse/MESOS-4877 and
>> >>> https://issues.apache.org/jira/browse/MESOS-4878 ?
>> >>>
>> >>>
>> >>> On Tue, Mar 8, 2016 at 9:13 AM, Jie Yu  wrote:
>> >>>
>>  I'd like to fix https://issues.apache.org/jira/browse/MESOS-4888 as
>> well
>>  if you guys plan to cut another RC
>> 
>>  On Mon, Mar 7, 2016 at 10:16 AM, Daniel Osborne <
>>  daniel.osbo...@metaswitch.com> wrote:
>> 
>> > -1
>> >
>> > If it doesn’t cause too much pain, I'm hoping we can squeeze a
>> > relatively small patch which restores Mesos' ability to extract
>> Docker
>> > assigned IPs. This has been broken with Docker 1.10's release over
>> a month
>> > ago, and prevents service discovery and DNS from working.
>> >
>> > Mesos-4370: https://issues.apache.org/jira/browse/MESOS-4370
>> > RB# 43093: https://reviews.apache.org/r/43093/
>> >
>> > I've built 0.28.0-rc1 with this patch and can confirm that it fixes
>> it
>> > as expected.
>> >
>> > Apologies for not bringing this to attention earlier.
>> >
>> > Thanks all,
>> > Dan
>> >
>> > -Original Message-
>> > From: Vinod Kone [mailto:vinodk...@apache.org]
>> > Sent: Thursday, March 3, 2016 5:44 PM
>> > To: dev ; user 
>> > Subject: [VOTE] Release Apache Mesos 0.28.0 (rc1)
>> >
>> > Hi all,
>> >
>> >
>> > Please vote on releasing the following candidate as Apache Mesos
>> 0.28.0.
>> >
>> >
>> > 0.28.0 includes the following:
>> >
>> >
>> >
>> 
>> >
>> >   * [MESOS-4343] - A new cgroups isolator for enabling the net_cls
>> > subsystem in
>> >
>> > Linux. The cgroups/net_cls isolator allows operators to provide
>> > network
>> >
>> >
>> > performance isolation and network segmentation for containers
>> within
>> > a Mesos
>> >
>> > cluster. To enable the cgroups/net_cls isolator, append
>> > `cgroups/net_cls` to
>> >
>> > the `--isolation` flag when starting the slave. Please refer to
>> >
>> >
>> > docs/mesos-containerizer.md for more details.
>> >
>> >
>> >
>> >
>> >
>> >   * [MESOS-4687] - The implementation of scalar resource values
>> (e.g.,
>> > "2.5
>> >
>> >
>> > CPUs") has changed. Mesos now reliably supports resources with
>> up to
>> > three
>> >
>> > decimal digits of precision (e.g., "2.501 CPUs"); resources with
>> > more than
>> >
>> > three decimal digits of precision will be rounded. Internally,
>> > resource math
>> >
>> > is now done using a fixed-point format that s

Re: 0.28.0 release

2016-03-03 Thread Timothy Chen

Sorry I pushed a quick typo fix before seeing this email.

Tim

On Thu, Mar 3, 2016 at 4:15 PM, Vinod Kone  wrote:
> Alright, all the blockers are resolved. I'll be cutting the RC shortly.
>
> I'm also taking a soft lock on the 'master' branch. *Committers:* *Please
> do not push any commits upstream until I release the lock.*
>
> Thanks,
>
> On Mon, Feb 29, 2016 at 1:36 PM, Vinod Kone  wrote:
>
>> Hi folks,
>>
>> I'm volunteering to be the Release Manager for 0.28.0. Joris and Kevin
>> Klues have kindly agreed to help me out. The plan is cut an RC tomorrow
>> 03/01.
>>
>> The dashboard for the release is here:
>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12327751
>>
>> *If you have a ticket marked with "Fix Version 028.0" and is not in
>> "Resolved" state, verify if it's a blocker for 0.28.0. If not, please unset
>> the Fix Version.*
>>
>>
>> Thanks,
>> Vinod
>>
>>

Re: Mesos fetcher in dockerized slave

2016-01-08 Thread Timothy Chen

I can shepherd no problem.

Tim

> On Dec 25, 2015, at 4:32 PM, Shuai Lin  wrote:
> 
> I'll work on it. @Tim could you shepherd it?
> 
>> On Sat, Dec 26, 2015 at 2:49 AM, Marica Antonacci 
>>  wrote:
>> Hi Tim and Shuai,
>> 
>> thank you very much for your reply. I have opened a JIRA issue on this: 
>> https://issues.apache.org/jira/browse/MESOS-4249
>> I hope it will be patched soon :) 
>> 
>> Best regards,
>> Marica
>> 
>> 
>>> Il giorno 24/dic/2015, alle ore 17:54, Tim Chen  ha 
>>> scritto:
>>> 
>>> Hi Marica/Shuai,
>>> 
>>> Sorry haven't been able to spend the time to repro, but looks like Shuai 
>>> confirmed it.
>>> 
>>> Can one of you file a JIRA?
>>> 
>>> Thanks!
>>> 
>>> Tim
>>> 
 On Thu, Dec 24, 2015 at 6:16 AM, Shuai Lin  wrote:
 Hi Marica,
 
 I can reproduce the problem exactly as you described in the first email of 
 this thread. Without `MESOS_DOCKER_MESOS_IMAGE` environment variable set, 
 the fetcher works just fine; With it, the fetcher steps seems skipped. 
 This looks like a bug to me.
 
 Regards,
 Shuai
 
> On Tue, Dec 22, 2015 at 7:41 PM, Marica Antonacci 
>  wrote:
> Dear all,
> 
> I have not solved this issue yet. Please, can anyone run the same test 
> and let me know if the fetcher is correctly invoked? 
> The test is really simple, just try to start a dockerized app (see json 
> definition file below) through marathon on a mesos slave running in a 
> docker container started with the option —docker_mesos_image= image>.
> I would appreciate very much any feedback. 
> 
> Sample Marathon app:
> { 
>  "id": "test-app",
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "libmesos/ubuntu"
>}
>  },
>  "cpus": 1,
>  "mem": 512,
>  "uris": [ 
> "http://www.stat.cmu.edu/~cshalizi/402/lectures/16-glm-practicals/snoqualmie.csv";
>  ],
>  "cmd": "cd $MESOS_SANDBOX; ls -latr; while sleep 10; do date -u +%T; 
> done" 
> }
> 
> Docker run command to start dockerized mesos slave:
> 
> # docker run -d MESOS_HOSTNAME= -e MESOS_IP= -e 
> MESOS_MASTER=zk://:2181,:2181,:2181/mesos -e 
> MESOS_CONTAINERIZERS=docker,mesos -e 
> MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins -e MESOS_LOG_DIR=/var/log -e 
> MESOS_docker_mesos_image=mesos-slave -v /sys/fs/cgroup:/sys/fs/cgroup -v 
> /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos --name 
> slave --net host --privileged --pid host mesos-slave
> 
> Thank you very much in advance!
> Best regards,
> Marica
> 
>> Il giorno 19/dic/2015, alle ore 19:32, Marica Antonacci 
>>  ha scritto:
>> 
>> Dear Tim,
>> 
>> I have collected some information from my test environment, starting the 
>> slave container with and without the —docker_mesos_image startup flag. 
>> Please let me know if you need further input. Thank you very much for 
>> your support!
>> 
>> Using the flag —docker_mesos_image:
>> 
>> root@mesos-slave:~# docker ps
>> CONTAINER IDIMAGE   COMMAND  CREATED 
>> STATUS  PORTS   NAMES
>> b30cea22a07clibmesos/ubuntu "/bin/sh -c 'cd $MESO"   2 
>> minutes ago   Up 2 minutes
>> mesos-db70e09f-f39d-491c-8480-73d9858c140b-S0.d965f59b-cc1a-4081-95d2-f3370214c84d
>> da9c78ec5727mesos-slave "/bin/sh -c '/usr/lib"   2 
>> minutes ago   Up 2 minutes
>> mesos-db70e09f-f39d-491c-8480-73d9858c140b-S0.d965f59b-cc1a-4081-95d2-f3370214c84d.executor
>> 150f78fbf327mesos-slave "/entrypoint.sh /usr/"   3 
>> minutes ago   Up 3 minutesslave
>> 
>> root@mesos-slave:~# docker logs slave
>> I1219 18:03:38.308544 19476 slave.cpp:1294] Got assigned task 
>> test-app.d4398af9-a67a-11e5-b1cf-fa163e920cd0 for framework 
>> 246b272b-d649-47c0-88ca-6b1ff35f437a-
>> I1219 18:03:38.314268 19476 slave.cpp:1410] Launching task 
>> test-app.d4398af9-a67a-11e5-b1cf-fa163e920cd0 for framework 
>> 246b272b-d649-47c0-88ca-6b1ff35f437a-
>> I1219 18:03:38.316261 19476 paths.cpp:436] Trying to chown 
>> '/tmp/mesos/slaves/db70e09f-f39d-491c-8480-73d9858c140b-S0/frameworks/246b272b-d649-47c0-88ca-6b1ff35f437a-/executors/test-app.d4398af9-a67a-11e5-b1cf-fa163e920cd0/runs/d965f59b-cc1a-4081-95d2-f3370214c84d'
>>  to user 'root'
>> I1219 18:03:38.327221 19476 slave.cpp:4999] Launching executor 
>> test-app.d4398af9-a67a-11e5-b1cf-fa163e920cd0 of framework 
>> 246b272b-d649-47c0-88ca-6b1ff35f437a- with resources cpus(*):0.1; 
>> mem(*):32 in work directory 
>> '/tmp/mesos/slaves/db70e09f-f39d-491c-8480-73d9858c140b-S0/frameworks/246b272b-d649-47c0-88ca-6b1ff35f437a-/ex

Re: Mesos .26 failing on centos7

2015-11-09 Thread Timothy Chen

My commits that caused the trouble are reverted now.

And also 0.26 will not be based on master, it typically are cherry picked 
commits to specific tag.

Tim

> On Nov 9, 2015, at 6:37 AM, Plotka, Bartlomiej  
> wrote:
> 
> I had the same issue (broken build) on Ubuntu 14.04.. Commit “cee4958” helped.
> 
> Kind Regards,
> Bartek Plotka
> 
> From: Jan Schlicht [mailto:j...@mesosphere.io]
> Sent: Monday, November 9, 2015 3:27 PM
> To: user@mesos.apache.org
> Cc: dev 
> Subject: Re: Mesos .26 failing on centos7
> 
> There were some build errors due to some reverts in `registry_puller.cpp`. 
> Your error logs hints that it may be related to this. They should be fixed 
> now (with `cee4958`).
> 
> On Mon, Nov 9, 2015 at 3:23 PM, haosdent 
> mailto:haosd...@gmail.com>> wrote:
> Could you show more details about error log? I could build current master 
> branch in CentOS 7.
> 
> On Mon, Nov 9, 2015 at 10:00 PM, Pradeep Kiruvale 
> mailto:pradeepkiruv...@gmail.com>> wrote:
> Hi All,
> 
> I am trying to compile mesos on Centos7, but its failing. Please let me know 
> what is the reason.
> 
> Find the logs below.
> 
> Regards,
> Pradeep
> 
> make[2]: *** 
> [slave/containerizer/mesos/provisioner/docker/libmesos_no_3rdparty_la-registry_puller.lo]
>  Error 1
> make[2]: *** Waiting for unfinished jobs
> mv -f 
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Tpo
>  
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-cpushare.Plo
> mv -f master/.deps/libmesos_no_3rdparty_la-master.Tpo 
> master/.deps/libmesos_no_3rdparty_la-master.Plo
> mv -f java/jni/.deps/libjava_la-convert.Tpo 
> java/jni/.deps/libjava_la-convert.Plo
> mv -f examples/.deps/libexamplemodule_la-example_module_impl.Tpo 
> examples/.deps/libexamplemodule_la-example_module_impl.Plo
> mv -f 
> slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Tpo
>  
> slave/containerizer/mesos/isolators/namespaces/.deps/libmesos_no_3rdparty_la-pid.Plo
> mv -f 
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Tpo
>  
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-perf_event.Plo
> mv -f 
> slave/containerizer/mesos/provisioner/backends/.deps/libmesos_no_3rdparty_la-bind.Tpo
>  
> slave/containerizer/mesos/provisioner/backends/.deps/libmesos_no_3rdparty_la-bind.Plo
> mv -f 
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Tpo
>  
> slave/containerizer/mesos/isolators/cgroups/.deps/libmesos_no_3rdparty_la-mem.Plo
> mv -f linux/.deps/libmesos_no_3rdparty_la-perf.Tpo 
> linux/.deps/libmesos_no_3rdparty_la-perf.Plo
> mv -f 
> slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Tpo 
> slave/containerizer/.deps/libmesos_no_3rdparty_la-external_containerizer.Plo
> mv -f log/.deps/liblog_la-replica.Tpo log/.deps/liblog_la-replica.Plo
> mv -f slave/.deps/libmesos_no_3rdparty_la-slave.Tpo 
> slave/.deps/libmesos_no_3rdparty_la-slave.Plo
> mv -f 
> slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Tpo 
> slave/containerizer/mesos/.deps/libmesos_no_3rdparty_la-containerizer.Plo
> mv -f 
> slave/resource_estimators/.deps/libfixed_resource_estimator_la-fixed.Tpo 
> slave/resource_estimators/.deps/libfixed_resource_estimator_la-fixed.Plo
> mv -f 
> slave/containerizer/mesos/isolators/filesystem/.deps/libmesos_no_3rdparty_la-linux.Tpo
>  
> slave/containerizer/mesos/isolators/filesystem/.deps/libmesos_no_3rdparty_la-linux.Plo
> mv -f log/.deps/liblog_la-coordinator.Tpo log/.deps/liblog_la-coordinator.Plo
> mv -f log/.deps/liblog_la-recover.Tpo log/.deps/liblog_la-recover.Plo
> 
> 
> 
> 
> --
> Best Regards,
> Haosdent Huang
> 
> 
> 
> --
> Jan Schlicht
> Distributed Systems Engineer, Mesosphere
> 
> 
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII 
> Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 
> 957-07-52-316 | Kapital zakladowy 200.000 PLN.
> 
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
> moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
> wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
> jakiekolwiek
> przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). If you are not the intended recipient, 
> please contact the sender and delete all copies; any review or distribution by
> others is strictly prohibited.

Re: spark mesos shuffle service failing under marathon

2015-11-07 Thread Timothy Chen

If you want to use Marathon start the mesos shuffle service, don't use the sbin 
script since it runs it as a daemon in background.

Instead use spark-class script and run the MesosExternalShuffleService class 
directly so it runs in the foreground.

Tim


> On Nov 7, 2015, at 7:02 AM, Klaus Ma  wrote:
> 
> Can you share more logs? I used to start spark shuffle in Mesos + Marathon 
> cluster; logs will be helpful to identify issues.
> 
> 
> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer 
> Platform Symphony/DCOS Development & Support, STG, IBM GCG 
> +86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me
> 
>> On Thu, Nov 5, 2015 at 4:29 AM, Dean Wampler  
>> wrote:
>> Can you find anything in the logs that would indicate a failure?
>> 
>>> On Wed, Nov 4, 2015 at 9:23 PM, Rodrick Brown  
>>> wrote:
>>> Starting the mesos shuffle service seems to background the process so when 
>>> ever marathon tries to bring up this process it constantly keeps trying to 
>>> start and never registers as started? Is there a fix for this? 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> Rodrick Brown / DevOPs Engineer 
>>> +1 917 445 6839 / rodr...@orchardplatform.com
>>> 
>>> Orchard Platform 
>>> 101 5th Avenue, 4th Floor, New York, NY 10003 
>>> http://www.orchardplatform.com
>>> 
>>> Orchard Blog | Marketplace Lending Meetup
>>> 
>>> 
>>> 
>>> NOTICE TO RECIPIENTS: This communication is confidential and intended for 
>>> the use of the addressee only. If you are not an intended recipient of this 
>>> communication, please delete it immediately and notify the sender by return 
>>> email. Unauthorized reading, dissemination, distribution or copying of this 
>>> communication is prohibited. This communication does not constitute an 
>>> offer to sell or a solicitation of an indication of interest to purchase 
>>> any loan, security or any other financial product or instrument, nor is it 
>>> an offer to sell or a solicitation of an indication of interest to purchase 
>>> any products or services to any persons who are prohibited from receiving 
>>> such information under applicable law. The contents of this communication 
>>> may not be accurate or complete and are subject to change without notice. 
>>> As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
>>> "Orchard") makes no representation regarding the accuracy or completeness 
>>> of the information contained herein. The intended recipient is advised to 
>>> consult its own professional advisors, including those specializing in 
>>> legal, tax and accounting matters. Orchard does not provide legal, tax or 
>>> accounting advice.
>> 
>> 
>> 
>> -- 
>> Dean Wampler, Ph.D.
>> Typesafe
>> Author: Programming Scala, 2nd Edition (O'Reilly)
>> @deanwampler
>

Re: Recover docker containers when mesos-slave is contained in docker as well

2015-10-22 Thread Timothy Chen

Hi Grzegorz,

Yes it's possible, but do require some configuration for the slave recover the 
running containers. This is needed to run Mesos on CoreOS as well, so it's made 
possible I believe around 0.24.1 or later.

Basically to have the slave to recover the task containers, the executors that 
watches the tasks need to be launched in containers as well. This is made 
possible with the docker_mesos_image slave flag, where the docker containerizer 
will use this image to launch executors. This should be the same image used to 
launch the slave itself.

Also when launching the slave in a docker container it must have the following 
docker flags:

--pid=host (so all processes can be visible to the slave)
-v /var/run/docker.sock:/var/run/docker.sock (slave can then launch containers 
as peers)
-v /tmp/mesos:/tmp/mesos (we need slave work directory information to persist 
when slave recovers. You can also create a separate dir on the host if you want 
to run multiple slaves)

Tim

> On Oct 22, 2015, at 6:59 PM, Grzegorz Graczyk  wrote:
> 
> Docker is running when slave exits - and so are docker containers started by 
> mesos slave. The problem starts when slave is online again and cannot see 
> already started containers and recover them...
> Isn't this supposed to fix that problem? 
> https://issues.apache.org/jira/browse/MESOS-2115
> 
>> On 22 October 2015 at 12:54, Klaus Ma  wrote:
>> It seems we can NOT keep docker running but slave exit. 
>> 
>>> On Thu, Oct 22, 2015 at 6:18 AM, Grzegorz Graczyk  
>>> wrote:
>>> Hi everyone,
>>> I was wondering if it's possible to recover running docker containers after 
>>> restart of mesos-slave?
>>> If it is possible - what are the requirements to do so?
>>> 
>>> Regards,
>>> Grzegorz Graczyk
>> 
>> 
>> 
>> -- 
>> Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer 
>> Platform Symphony/DCOS Development & Support, STG, IBM GCG 
>> +86-10-8245 4084 | mad...@cn.ibm.com | http://www.cguru.net
>

Re: Can health-checks be run by Mesos for docker tasks?

2015-10-06 Thread Timothy Chen

Hi Jay, 

We just added health check support for docker tasks that's in master but not 
yet released. It will run docker exec with the command you provided as health 
checks.

It should be in the next release.

Thanks!

Tim


> On Oct 6, 2015, at 6:49 PM, Jay Taylor  wrote:
> 
> Does Mesos support health checks for docker image tasks?  Mesos seems to be 
> ignoring the TaskInfo.HealthCheck field for me.
> 
> Example TaskInfo JSON received back from Mesos:
> 
>>> {
>>>   "name":"hello-app.web.v3",
>>>   "task_id":{
>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>   },
>>>   "slave_id":{
>>> "value":"20150924-210922-1608624320-5050-1792-S1"
>>>   },
>>>   "resources":[
>>> {
>>>   "name":"cpus",
>>>   "type":0,
>>>   "scalar":{
>>> "value":0.1
>>>   }
>>> },
>>> {
>>>   "name":"mem",
>>>   "type":0,
>>>   "scalar":{
>>> "value":256
>>>   }
>>> },
>>> {
>>>   "name":"ports",
>>>   "type":1,
>>>   "ranges":{
>>> "range":[
>>>   {
>>> "begin":31002,
>>> "end":31002
>>>   }
>>> ]
>>>   }
>>> }
>>>   ],
>>>   "command":{
>>> "container":{
>>>   "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>> },
>>> "shell":false
>>>   },
>>>   "container":{
>>> "type":1,
>>> "docker":{
>>>   "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>   "network":2,
>>>   "port_mappings":[
>>> {
>>>   "host_port":31002,
>>>   "container_port":8000,
>>>   "protocol":"tcp"
>>> }
>>>   ],
>>>   "privileged":false,
>>>   "parameters":[],
>>>   "force_pull_image":false
>>> }
>>>   },
>>>   "health_check":{
>>> "delay_seconds":5,
>>> "interval_seconds":10,
>>> "timeout_seconds":10,
>>> "consecutive_failures":3,
>>> "grace_period_seconds":0,
>>> "command":{
>>>   "shell":true,
>>>   "value":"sleep 5",
>>>   "user":"root"
>>> }
>>>   }
>>> }
> 
> I have searched all machines and containers to see if they ever run the 
> command (in this case `sleep 5`), but have not found any indication that it 
> is being executed.
> 
> In the mesos src code the health-checks are invoked from 
> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  Does this mean 
> that health-checks are only supported for custom executors and not for docker 
> tasks?
> 
> What I am trying to accomplish is to have the 0/non-zero exit-status of a 
> health-check command translate to task health.
> 
> Thanks!
> Jay

Re: Error: no resources available to schedule container

2015-09-25 Thread Timothy Chen

Hi Diana,

This is a known bug that's in swarm.

I'll be looking into this soon.

Tim

> On Sep 25, 2015, at 3:32 AM, Diana J Arroyo  wrote:
> 
> Hi,
> I'm getting "ERRO[0038] HTTP error: no resources available to schedule 
> container  status=500" when trying to deploy a container through the swarm 
> manager framework running on Mesos.
> 
> Here are some details:
> mesos version: 0.23.0
> swarm version: 0.4.0
> 
> command used to start mesos master...
> HOST1<'mesos' userid>: ./bin/mesos-master.sh --ip=x.x.x.1 
> --work_dir=/var/lib/mesos --log_dir=../log_dir --quiet
> 
> commands used to start mesos node agent...
> HOST2<'mesos' userid>: export DOCKER_HOST=0.0.0.0:2375
> HOST2<'mesos' userid>: ./bin/mesos-slave.sh  --containerizers=docker,mesos 
> --executor_registration_timeout=5mins --master=x.x.x.1:5050 
> --log_dir=../log_dir
> 
> command used to start swarm master...
> HOST1<'mesos' user>: /root/gocode/bin/swarm --debug manage -c 
> mesos-experimental --cluster-opt mesos.address=x.x.x.1 --cluster-opt 
> mesos.port=3375 --host x.x.x.1:4375  x.x.x.1:5050
> 
> command used to create a container...
> HOST1<'root' user>:docker -H tcp://x.x.x.1:4375 run -d -c 2 --name sleep 
> ubuntu /bin/sleep 10
> 
> error message displayed on the swarm console...
> HOST1: ERRO[0038] HTTP error: no resources available to 
> schedule container  status=500
> 
> error message from tmp directory...
> HOST2<'/tmp' directory>: 
> cat 
> mesos/slaves/.../frameworks/.../executors/sleep.c10c3f7af11e/runs/2d349f82-4927-4c63-82e0-c8eaf34362fd/stderr
> I0925 04:39:25.770843 17606 exec.cpp:132] Version: 0.23.0
> I0925 04:39:25.782850 17627 exec.cpp:206] Executor registered on slave 
> 20150925-034538-4226419721-5050-19580-S5
> Post 
> http:///var/run/docker.sock/v1.19/containers/create?name=mesos-20150925-034538-4226419721-5050-19580-S5.2d349f82-4927-4c63-82e0-c8eaf34362fd:
>  dial unix /var/run/docker.sock: no such file or directory. Are you trying to 
> connect to a TLS-enabled daemon without TLS?
> W0925 04:39:25.782850 17620 logging.cpp:81] RAW: Received signal SIGTERM from 
> process 17511 of user 1000; exiting
> 
> error message from mesos log...
> HOST2:
> cat ~/mesos-0.23.0/log_dir/lt-mesos-slave.ERROR  
> Log file created at: 2015/09/25 04:39:31
> Running on machine: mesosStagingCompute9
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> E0925 04:39:31.302273 17540 slave.cpp:2821] Failed to update resources for 
> container 2d349f82-4927-4c63-82e0-c8eaf34362fd of executor sleep.c10c3f7af11e 
> running task sleep.c10c3f7af11e on status update for terminal task, 
> destroying container: Failed to 'docker inspect 
> mesos-20150925-034538-4226419721-5050-19580-S5.2d349f82-4927-4c63-82e0-c8eaf34362fd':
>  exit status = exited with status 1 stderr = Error: No such image or 
> container: 
> mesos-20150925-034538-4226419721-5050-19580-S5.2d349f82-4927-4c63-82e0-c8eaf34362fd
> 
> 
> The error message from the tmp directory looks like the DOCKER_HOST env. var 
> is not being picked up since I get this similar error message on HOST2 when I 
> do a simple 'docker ps' instead of 'docker -H 0.0.0.0:2375 ps'.
> 
> Is there any other configuration I need to setup on the slave such that the 
> slave can create a container using the Mesos and Swarm environment?  Please 
> advise.
> 
> Best Regards,
> Diana Arroyo
> darr...@us.ibm.com
>

Re: [VOTE] Release Apache Mesos 0.24.0 (rc1)

2015-08-27 Thread Timothy Chen

That test is failing because of a wierd bug in CentOS 7 not naming the
cgroups correctly (or at least not following the pattern every other
OS).

I filed a CentOS bug but no response so far, if we want to fix it we
will have to work around this problem by hardcoding another cgroup
name to test cpuacct,cpu.

Tim

On Thu, Aug 27, 2015 at 4:00 PM, Vinod Kone  wrote:
> Happy to cut another RC.
>
> IIUC, https://reviews.apache.org/r/37684 doesn't fix the below test.
>
> [  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where
> TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess
>
> Is someone working on fixing that (MESOS-3294
> )? If yes, I would wait a
> day or two to get that in.
>
> Any other issues people have encountered with RC1?
>
>
>
> On Thu, Aug 27, 2015 at 3:45 PM, Niklas Nielsen 
> wrote:
>
>> If it is that easy to fix, why not get it in?
>>
>> How about https://issues.apache.org/jira/browse/MESOS-3053 (which
>> Haosdent ran into)?
>>
>> On 27 August 2015 at 15:36, Jie Yu  wrote:
>>
>>> Niklas,
>>>
>>> This is the known problem reported by Marco. I am OK with both because
>>> the linux filesystem isolator cannot be used in 0.24.0.
>>>
>>> If you guys prefer to cut another RC, here is the patch that needs to be
>>> cherry picked:
>>>
>>> commit 3ecd54320397c3a813d555f291b51778372e273b
>>> Author: Greg Mann 
>>> Date:   Fri Aug 21 13:21:10 2015 -0700
>>>
>>> Added symlink test for /bin, lib, and /lib64 when preparing test root
>>> filesystem.
>>>
>>> Review: https://reviews.apache.org/r/37684
>>>
>>>
>>>
>>> On Thu, Aug 27, 2015 at 3:30 PM, Niklas Nielsen 
>>> wrote:
>>>
 -1: sudo make check on centos 7

 [--] Global test environment tear-down

 [==] 793 tests from 121 test cases ran. (606946 ms total)

 [  PASSED  ] 786 tests.

 [  FAILED  ] 7 tests, listed below:

 [  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where
 TypeParam = mesos::internal::slave::CgroupsCpushareIsolatorProcess

 [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_ChangeRootFilesystem

 [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromSandbox

 [  FAILED  ] LinuxFilesystemIsolatorTest.ROOT_VolumeFromHost

 [  FAILED  ]
 LinuxFilesystemIsolatorTest.ROOT_VolumeFromHostSandboxMountPoint

 [  FAILED  ]
 LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem

 [  FAILED  ] MesosContainerizerLaunchTest.ROOT_ChangeRootfs

 Configured with:

 ../mesos/configure --prefix=/home/vagrant/releases/0.24.0/
 --disable-python

 On 26 August 2015 at 17:00, Khanduja, Vaibhav 
 wrote:

> +1
>
> > On Aug 26, 2015, at 4:43 PM, Vinod Kone  wrote:
> >
> > Pinging the thread for more (binding) votes. Hopefully people have
> caught
> > up with emails after Mesos madness.
> >
> >> On Wed, Aug 19, 2015 at 1:28 AM, haosdent 
> wrote:
> >>
> >> +1
> >>
> >> OS: Ubutnu 14.04
> >> Verify command: sudo make -j8 check
> >> Compiler: Both gcc4.8 and clang3.5
> >> Configuration: default configuration
> >> Result: all tests(828 tests) pass
> >>
> >> MESOS-3053  is
> because
> >> need update add iptable first.
> >>
> >>> On Wed, Aug 19, 2015 at 2:39 PM, haosdent 
> wrote:
> >>>
> >>> Could not
> >>> pass DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged in
> Ubuntu
> >>> 14.04. Already have a issue for this
> >>> https://issues.apache.org/jira/browse/MESOS-3053, it is acceptable?
> >>>
> >>> On Wed, Aug 19, 2015 at 12:55 PM, Marco Massenzio <
> ma...@mesosphere.io>
> >>> wrote:
> >>>
>  +1 (non-binding)
> 
>  All tests (including ROOT) pass on:
>  Ubuntu 14.04 (physical box)
> 
>  All non-ROOT tests pass on:
>  CentOS 7 (VirtualBox VM)
> 
>  Known issue (MESOS-3050) for ROOT tests on CentOS 7, non-blocker.
> 
>  Thanks,
> 
>  *Marco Massenzio*
> 
>  *Distributed Systems Engineerhttp://codetrips.com <
> http://codetrips.com>*
> 
>  On Tue, Aug 18, 2015 at 3:26 PM, Vinod Kone 
>  wrote:
> 
> > 0.24.0 includes the following:
> >
> >
> >
> 
> >
> > Experimental support for v1 scheduler HTTP API!
> >
> > This release also wraps up support for fetcher.
> >
> >
> > The CHANGELOG for the release is available at:
> >
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.24.0-rc1
> >
> >
> >

Re: Change DOCKER_HOST for Mesos slave

2015-08-03 Thread Timothy Chen

Hi Andrii,

We never intend to pick up local OS environment to be passed into 
containerizer, as we want to make sure all environment variables are 
intentionally specifies from the framework for the task.

Does that docker.conf setting generate a new docker.conf file or?

Tim 



> On Aug 3, 2015, at 2:51 AM, Andrii Loshkovskyi  wrote:
> 
> Hello,
> 
> I was able to change DOCKER_HOST in Mesos 0.22.1 but with upgrade to Mesos 
> 0.23 I'm no longer able to do that.
> 
> I override the systemd unit file this way:
> 
> cat /etc/systemd/system/mesos-slave.service.d/docker.conf 
> [Service]
> Environment="DOCKER_HOST=localhost:2377"
> 
> cat /etc/mesos-slave/containerizers 
> docker,mesos
> 
> This doesn't work in the latest Mesos. I use CentOS 7, Docker 1.6.2.
> From the changelog I see that there were a lot of changes to containers 
> network isolation in Mesos, maybe it's somehow related.
> I have checked the containeraizer code, but not able to identify the issue.
> 
> I would really appreciate if someone advises me on this issue.
> 
> Thank you.
> 
> --
> Kind regards,
> Andrii Loshkovskyi

Re: [VOTE] Release Apache Mesos 0.23.0 (rc4)

2015-07-22 Thread Timothy Chen

+1 

The docker bridge network test failed because some iptable rules that was set 
on the environment. I will comment on the JIRA but not a blocker.

Tim


> On Jul 22, 2015, at 1:07 PM, Benjamin Hindman  
> wrote:
> 
> +1 (binding)
> 
> On Ubuntu 14.04:
> 
> $ make check
> ... all tests pass ...
> $ sudo make check
> ... tests with known issues fail, but ignoring because these have all been
> resolved and are issues with the tests alone ...
> 
> Thanks Adam.
> 
>> On Fri, Jul 17, 2015 at 4:42 PM Adam Bordelon  wrote:
>> 
>> Hello Mesos community,
>> 
>> Please vote on releasing the following candidate as Apache Mesos 0.23.0.
>> 
>> 0.23.0 includes the following:
>> 
>> 
>> - Per-container network isolation
>> - Dockerized slaves will properly recover Docker containers upon failover.
>> - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+.
>> 
>> as well as experimental support for:
>> - Fetcher Caching
>> - Revocable Resources
>> - SSL encryption
>> - Persistent Volumes
>> - Dynamic Reservations
>> 
>> The CHANGELOG for the release is available at:
>> 
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc4
>> 
>> 
>> 
>> The candidate for Mesos 0.23.0 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc4/mesos-0.23.0.tar.gz
>> 
>> The tag to be voted on is 0.23.0-rc4:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc4
>> 
>> The MD5 checksum of the tarball can be found at:
>> 
>> https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc4/mesos-0.23.0.tar.gz.md5
>> 
>> The signature of the tarball can be found at:
>> 
>> https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc4/mesos-0.23.0.tar.gz.asc
>> 
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>> 
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1062
>> 
>> Please vote on releasing this package as Apache Mesos 0.23.0!
>> 
>> The vote is open until Wed July 22nd, 17:00 PDT 2015 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Mesos 0.23.0 (I've tested it!)
>> [ ] -1 Do not release this package because ...
>> 
>> Thanks,
>> -Adam-
>> 
>>

Re: [VOTE] Release Apache Mesos 0.23.0 (rc3)

2015-07-16 Thread Timothy Chen

As Adam mention I also think this is not a blocker, as it only affects
the way we test the cgroup on CentOS 7.x due to a CentOS bug and
doesn't actually impact Mesos normal operations.

My vote is +1 as well.

Tim

On Thu, Jul 16, 2015 at 12:10 PM, Vinod Kone  wrote:
> Found a bug in HTTP API related code: MESOS-3055
> 
>
> If we don't fix this in 0.23.0, we cannot expect the 0.24.0 scheduler
> driver (that will send Calls) to properly subscribe with a 0.23.0 master. I
> could add a work around in the driver to only send Calls if the master
> version is 0.24.0, but would prefer to not have to do that.
>
> Also, on the review  for that bug, we
> realized that we might want to make Subscribe.force 'optional' instead of
> 'required'. That's an API change, which would be nice to go into 0.23.0 as
> well.
>
> So, not a -1 per se, but if you are willing to cut another RC, I can land
> the fixes today. Sorry for the trouble.
>
> On Thu, Jul 16, 2015 at 11:48 AM, Adam Bordelon  wrote:
>
>> +1 (binding)
>> This vote has been silent for almost a week. I assume everybody's busy
>> testing. My testing results: basic integration tests passed for Mesos
>> 0.23.0 on CoreOS with DCOS GUI/CLI, Marathon, Chronos, Spark, HDFS,
>> Cassandra, and Kafka.
>>
>> `make check` passes on Ubuntu and CentOS, but `sudo make check` fails on
>> CentOS 7.1 due to errors in CentOS. See
>> https://issues.apache.org/jira/browse/MESOS-3050 for more details. I'm not
>> convinced this is serious enough to do another release candidate and voting
>> round, but I'll let Tim and others chime in with their thoughts.
>>
>> If we don't get enough deciding votes by 6pm Pacific today, I'll extend the
>> vote for another day.
>>
>> On Thu, Jul 9, 2015 at 6:09 PM, Khanduja, Vaibhav <
>> vaibhav.khand...@emc.com>
>> wrote:
>>
>> > +1
>> >
>> > Sent from my iPhone. Please excuse the typos and brevity of this message.
>> >
>> > > On Jul 9, 2015, at 6:07 PM, Adam Bordelon  wrote:
>> > >
>> > > Hello Mesos community,
>> > >
>> > > Please vote on releasing the following candidate as Apache Mesos
>> 0.23.0.
>> > >
>> > > 0.23.0 includes the following:
>> > >
>> >
>> 
>> > > - Per-container network isolation
>> > > - Dockerized slaves will properly recover Docker containers upon
>> > failover.
>> > > - Upgraded minimum required compilers to GCC 4.8+ or clang 3.5+.
>> > >
>> > > as well as experimental support for:
>> > > - Fetcher Caching
>> > > - Revocable Resources
>> > > - SSL encryption
>> > > - Persistent Volumes
>> > > - Dynamic Reservations
>> > >
>> > > The CHANGELOG for the release is available at:
>> > >
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.23.0-rc3
>> > >
>> >
>> 
>> > >
>> > > The candidate for Mesos 0.23.0 release is available at:
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz
>> > >
>> > > The tag to be voted on is 0.23.0-rc3:
>> > >
>> >
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.23.0-rc3
>> > >
>> > > The MD5 checksum of the tarball can be found at:
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.md5
>> > >
>> > > The signature of the tarball can be found at:
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/0.23.0-rc3/mesos-0.23.0.tar.gz.asc
>> > >
>> > > The PGP key used to sign the release is here:
>> > > https://dist.apache.org/repos/dist/release/mesos/KEYS
>> > >
>> > > The JAR is up in Maven in a staging repository here:
>> > > https://repository.apache.org/content/repositories/orgapachemesos-1060
>> > >
>> > > Please vote on releasing this package as Apache Mesos 0.23.0!
>> > >
>> > > The vote is open until Thurs July 16th, 18:00 PDT 2015 and passes if a
>> > > majority of at least 3 +1 PMC votes are cast.
>> > >
>> > > [ ] +1 Release this package as Apache Mesos 0.23.0
>> > > [ ] -1 Do not release this package because ...
>> > >
>> > > Thanks,
>> > > -Adam-
>> >
>>

Re: mesos slave in docker container

2015-06-13 Thread Timothy Chen

Can you enable GLOG_v=1 env variable when launching the slave and post the 
slave log?

Tim

> On Jun 13, 2015, at 12:44 AM, Tyson Norris  wrote:
> 
> Hi - 
> We are running mesos slave (0.22.0-1.0.ubuntu1404) in a docker container with 
> docker containerizer without problems on ubuntu 14.04 docker host (with 
> lxc-docker pkg etc added). 
> 
> Running the same slave container on RHEL 7.0 docker host, the container exits 
> almost immediately after starting with:
> I0613 07:18:15.161931  5303 slave.cpp:3808] Finished recovery
> I0613 07:18:15.162677  5303 slave.cpp:647] New master detected at 
> master@192.168.8.5:5050
> I0613 07:18:15.162753  5301 status_update_manager.cpp:171] Pausing sending 
> status updates
> I0613 07:18:15.163051  5303 slave.cpp:672] No credentials provided. 
> Attempting to register without authentication
> I0613 07:18:15.163734  5303 slave.cpp:683] Detecting new master
> W0613 07:18:15.163734  5293 logging.cpp:81] RAW: Received signal SIGTERM from 
> process 1166 of user 0; exiting
> 
> 
> If I do not enable the docker containerized, the slave container runs fine. 
> 
> Other containers that bind mount /var/run/docker.sock also run fine. 
> 
> Debug docker logs are below. 
> 
> One difference between the ubuntu docker host and RHEL docker host is that 
> the unbuntu host uses the aufs driver, while rhel uses devicemapper, and 
> selinux is enabled in RHEL but not ubuntu.
> 
> Thanks for any advice!
> Tyson
> 
> 
> 
> 
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=info 
> msg="POST 
> /v1.18/containers/9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd/start"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=info 
> msg="+job 
> start(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="activateDeviceIfNeeded(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm info 
> docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
>   OF   [16384] (*1)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm create 
> docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
>   OF   [16384] (*1)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): libdm-common.c:1348 (4) 
> docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd:
>  Stacking NODE_ADD (253,9) 0:0 0600 [verify_udev]"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm reload 
> docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
>   OF   [16384] (*1)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): ioctl/libdm-iface.c:1750 (4) dm resume 
> docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd
>   OF   [16384] (*1)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): libdm-common.c:1348 (4) 
> docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd:
>  Processing NODE_ADD (253,9) 0:0 0600 [verify_udev]"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="libdevmapper(6): libdm-common.c:983 (4) Created 
> /dev/mapper/docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd"
> Jun 13 07:28:26 phx-8 kernel: EXT4-fs (dm-9): mounted filesystem with ordered 
> data mode. Opts: discard
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=info 
> msg="+job log(start, 
> 9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd, 
> docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404)"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=info 
> msg="-job log(start, 
> 9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd, 
> docker.corp.adobe.com/tnorris/mesosslave:0.22.1-1.0.ubuntu1404) = OK (0)"
> Jun 13 07:28:26 phx-8 systemd-udevd: conflicting device node 
> '/dev/mapper/docker-253:3-16818501-9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd'
>  found, link to '/dev/dm-9' will not be created
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=debug 
> msg="Calling GET /containers/{name:.*}/json"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=info msg="GET 
> /containers/9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd/json"
> Jun 13 07:28:26 phx-8 docker: time="2015-06-13T07:28:26Z" level=info 
> msg="+job 
> container_inspect(9e897d0fd156dab5ec59f8ded2a6cdf7dc5379664c872cd7da4875b6aab9dfcd)"
> Jun 13 07

Re: change docker rm command

2015-04-12 Thread Timothy Chen

Hi Mike,

Can you create a jira for this? Currently there isn't s way to change that, but 
IMO it makes sense to change it since we delay removing the container already.

Tim 

> On Apr 12, 2015, at 12:10 PM, Mike Michel  wrote:
> 
> Hi,
>  
> is there a way to change the options how mesos removes docker containers? 
> Right now it seems, it is „docker rm –f ID“ so bind mounts are not deleted. 
> This means thousands of dirs in /var/lib/docker/vfs/dir   I would like to 
> change it to „docker rm –f –v ID“ This deletes bind mounts but not persistant 
> volumes.
>  
> Best,
>  
> Mike
>   
>

Re: Spark on Mesos / Executor Memory

2015-04-11 Thread Timothy Chen

Hi James,

You are right multiple frameworks becomes a different discussion how to adjust 
and allow more dynamic resource negotiation to happen, also factor in fairness 
and others.

There are more work that is happening in mesos to try to address multiple 
framework like optimistic offer and inverse offers, but I think in terms of 
dynamic memory needs for a framework its still largely based on the scheduler 
to specify and scale accordingly when resources are needed or not needed 
anymore.

One way that is being addressed in spark is integrating dynamic allocation into 
resource scheduler such as mesos and yarn, but there are still more work needed 
as dynamic allocation only looks at certain metrics that might not address all 
kinds of needs. If you have any specific use case or examples that you think 
existing work doesn't fit and like to be addressed that will be a good way to 
start the conversation.

Tim

> On Apr 11, 2015, at 1:05 PM, CCAAT  wrote:
> 
> Hello Tim,
> 
> Your approach seems most reasonable, particularly from an over arching 
> viewpoint. However, it occurs to me the that as folks have several to many 
> different frameworks (distributed applications)  running on a given mesos 
> cluster, that the optimization of resource allocation (utilization) may 
> ultimately need to be under some sort of tunable, dynamic scheme. Most 
> distributed application, say it runs for a few hours, will usually not have a 
> constant resource demand on memory  so how can any static configuration 
> suffice for a dynamic mix of frequently changing distributed application work 
> well with static configurations. This is particularly amplified as a problem, 
> where
> Apache-spark is an "in-memory" resource demand, that is very different
> than other frameworks that may be active on the same cluster.
> 
> I really think we are just experiencing the tip of the iceberg here
> as these mesos clusters grow, expand and take on a variety of problems,
> or did I miss some already existing robustness in the codes?
> 
> 
> James
> 
> 
> 
>> On 04/11/2015 12:29 PM, Tim Chen wrote:
>> (Adding spark user list)
>> 
>> Hi Tom,
>> 
>> If I understand correctly you're saying that you're running into memory
>> problems because the scheduler is allocating too much CPUs and not
>> enough memory to acoomodate them right?
>> 
>> In the case of fine grain mode I don't think that's a problem since we
>> have a fixed amount of CPU and memory per task.
>> However, in coarse grain you can run into that problem if you're with in
>> the spark.cores.max limit, and memory is a fixed number.
>> 
>> I have a patch out to configure how much max cpus should coarse grain
>> executor use, and it also allows multiple executors in coarse grain
>> mode. So you could say try to launch multiples of max 4 cores with
>> spark.executor.memory (+ overhead and etc) in a slave.
>> (https://github.com/apache/spark/pull/4027)
>> 
>> It also might be interesting to include a cores to memory multiplier so
>> that with a larger amount of cores we try to scale the memory with some
>> factor, but I'm not entirely sure that's intuitive to use and what
>> people know what to set it to, as that can likely change with different
>> workload.
>> 
>> Tim
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sat, Apr 11, 2015 at 9:51 AM, Tom Arnfeld > > wrote:
>> 
>>We're running Spark 1.3.0 (with a couple of patches over the top for
>>docker related bits).
>> 
>>I don't think SPARK-4158 is related to what we're seeing, things do
>>run fine on the cluster, given a ridiculously large executor memory
>>configuration. As for SPARK-3535 although that looks useful I think
>>we'e seeing something else.
>> 
>>Put a different way, the amount of memory required at any given time
>>by the spark JVM process is directly proportional to the amount of
>>CPU it has, because more CPU means more tasks and more tasks means
>>more memory. Even if we're using coarse mode, the amount of executor
>>memory should be proportionate to the amount of CPUs in the offer.
>> 
>>On 11 April 2015 at 17:39, Brenden Matthews >> wrote:
>> 
>>I ran into some issues with it a while ago, and submitted a
>>couple PRs to fix it:
>> 
>>https://github.com/apache/spark/pull/2401
>>https://github.com/apache/spark/pull/3024
>> 
>>Do these look relevant? What version of Spark are you running?
>> 
>>On Sat, Apr 11, 2015 at 9:33 AM, Tom Arnfeld >> wrote:
>> 
>>Hey,
>> 
>>Not sure whether it's best to ask this on the spark mailing
>>list or the mesos one, so I'll try here first :-)
>> 
>>I'm having a bit of trouble with out of memory errors in my
>>spark jobs... it seems fairly odd to me that memory
>>resources can only be set at the executor level, and not
>>also at th

Re: Custom python executor with Docker

2015-04-07 Thread Timothy Chen

Hi Tom(s),

Tom Arnfeld is right, if you want to launch your own docker container
in your custom executor you will have to handle all the issues
yourself and not able to use the Docker containerizer at all.

Alternatively, you can actually launch your custom executor in a
Docker container by Mesos, by specifying the ContainerInfo in the
ExecutorInfo.
What this means is that your custom executor is already running in a
docker container, and you can do your custom logic afterwards. This
does means you can simply just launch multiple containers in the
executor anymore.

If there is something you want to do and doesnt' fit these let us know
what you're trying to achieve and we can see what we can do.

Tim

On Tue, Apr 7, 2015 at 4:15 PM, Tom Arnfeld  wrote:
> It's not possible to invoke the docker containerizer from outside of Mesos,
> as far as I know.
>
> If you persue this route, you can run into issues with orphaned containers
> as your executor may die for some unknown reason, and the container is still
> running. Recovering from this can be tricky business, so it's better if you
> can adapt your framework design to fit within the Mesos Task/Executor
> pattern.
>
> --
>
> Tom Arnfeld
> Developer // DueDil
>
> (+44) 7525940046
> 25 Christopher Street, London, EC2A 2BS
>
>
> On Mon, Apr 6, 2015 at 7:00 PM, Vinod Kone  wrote:
>>
>> Tim, do you want answer this?
>>
>> On Wed, Apr 1, 2015 at 7:27 AM, Tom Fordon  wrote:
>>>
>>> Hi.  I'm trying to understand using docker within a custom executor. For
>>> each of my tasks, I would like to perform some steps on the node before
>>> launching a docker container. I was planning on writing a custom python
>>> executor for this, but I wasn't sure how to launch docker from within this
>>> executor.
>>>
>>> Can I just call docker in a subprocess using the ContainerInfo from the
>>> Task? If I do this, how does the Containerizer fit in?
>>>
>>> Thank you,
>>> Tom Fordon
>>
>>
>

Re: Spark on Mesos Submitted from multiple users

2015-02-21 Thread Timothy Chen

Hi John,

Having drivers launched on the cluster where you can query/kill is what I'm 
currently working on.

As for sharing drivers I will let others chime in if that ever makes sense.

Tim


> On Feb 21, 2015, at 11:29 AM, John Omernik  wrote:
> 
> So in my instance, instead having a bunch of drivers on one machine,
> at least each of the drivers would be out in cluster land... That's a
> bit better, however I see your point on not sharing drivers between
> apps, going to have to think that one through. Are there no cases
> where having a single driver supporting requests for a group apps
> makes sense or am I missing something there?   It seems like a logical
> way to put some limitations on groups of apps, but I may be missing
> something in how it's designed to be run.
> 
>> On Fri, Feb 20, 2015 at 10:22 AM, Tim Chen  wrote:
>> Hi John,
>> 
>> I'm currently working on a cluster mode design a PoC, but it is also not
>> sharing drivers as Spark AFAIK is designed to not share drivers between
>> apps.
>> 
>> The cluster mode for Mesos is going to be a way to submit apps to your
>> cluster, and each app will be running in the cluster as a new driver that is
>> managed by a cluster dispatcher, and you don't need to wait for the client
>> to finish to get all the results.
>> 
>> I'll be updating the JIRA and PR once I have this ready, which is aimed for
>> this next release.
>> 
>> Tim
>> 
>>> On Fri, Feb 20, 2015 at 8:09 AM, John Omernik  wrote:
>>> 
>>> Tim - on the Spark list your name was brought up in relation to
>>> https://issues.apache.org/jira/browse/SPARK-5338 I asked this question
>>> there but I'll ask it here too, what can I do to help on this. I am
>>> not a coder unfortunately, but I am user willing to try things :) This
>>> looks really cool for what we would like to do with Spark and Mesos
>>> and I'd love to be able to contribute and/or get an understanding of a
>>> (even tentative) timeline.  I am not trying to be pushy, I understand
>>> lots of things are likely on your agenda :)
>>> 
>>> John
>>> 
>>> 
>>> 
 On Tue, Feb 17, 2015 at 6:33 AM, John Omernik  wrote:
 Tim, thanks, that makes sense, the checking for ports and incrementing
 was new to me, so hearing about that helps.  Next question is it
 possible, for a driver to be shared by the same user some how? This
 would be desirable from the standpoint of running an iPython notebook
 server (Jupyter Hub).  I have it setup that every time a notebook is
 opened, that the imports for spark are run, (the idea is the
 environment is ready to go for analysis) however, if each user, has 5
 notebooks open at any time, that would be a lot of spark drivers! But,
 I suppose before asking that, I should ask about the sequence of
 drivers... are they serial? i.e. can one driver server only one query
 at a time?   What is the optimal size for a driver (in memory) what
 does the memory affect in the driver? I.e. is a driver with smaller
 amounts of memory limited in the number of results etc?
 
 Lots of questions here, if these are more spark related questions, let
 me know, I can hop over to spark users, but since I am curious on
 spark on mesos, I figured I'd try here first.
 
 Thanks for your help!
 
 
 
> On Mon, Feb 16, 2015 at 10:30 AM, Tim Chen  wrote:
> Hi John,
> 
> With Spark on Mesos, each client (spark-submit) starts a SparkContext
> which
> initializes its own SparkUI and framework. There is a default 4040 for
> the
> Spark UI port, but if it's occupied Spark automatically tries ports
> incrementally for you, so your next could be 4041 if it's available.
> 
> Driver is not shared between user, each user creates its own driver.
> 
> About slowness it's hard to say without any information, you need to
> tell us
> your cluster setup, what mode you're Mesos with and if there is
> anything
> else running in the cluster, the job, etc.
> 
> Tim
> 
>> On Sat, Feb 14, 2015 at 5:06 PM, John Omernik  wrote:
>> 
>> Hello all, I am running Spark on Mesos and I think I am love, but I
>> have some questions. I am running the python shell via iPython
>> Notebooks (Jupyter) and it works great, but I am trying to figure out
>> how things are actually submitted... like for example, when I submit
>> the spark app from the iPython notebook server, I am opening a new
>> kernel and I see a new spark submit (similar to the below) for each
>> kernel... but, how is that actually working on the cluster, I can
>> connect to the spark server UI on 4040, but shouldn't there be a
>> different one for each driver? Is that causing conflicts? after a
>> while things seem to run slow is this due to some weird conflicts?
>> Should I be specifying unique ports for each server? Is the driver
>> shared between users? what about between kerne's for t

Re: Help us review #MesosCon 2015 proposals

2015-02-21 Thread Timothy Chen

Hi Dave,

How many submissions are we looking to accept this year?

Tim 

> On Feb 21, 2015, at 1:43 PM, Dave Lester  wrote:
> 
> Thanks Steve and Pablo for the feedback.
>  
> Regarding the system of rating proposals: this is similar to what we used 
> last year, however last year we also had "strongly accept" and "strongly 
> reject" options which in my personal opinion were helpful for determining the 
> best-of-the-best. As a speaker subcommittee, we decided to simplify the 
> review process this time around by offering only three options. We may 
> reconsider this for next year.
>  
> Regarding the selection process: the MesosCon program committee will 
> ultimately determine the event programs -- a list of those who are members is 
> available here: 
> http://events.linuxfoundation.org/events/mesoscon/program/programcommittee. 
> Program committee members will be completing the same review process as 
> everyone else in the community, and we plan to use the feedback we receive as 
> signals for what to select for the program, and inform tracks for the event.
>  
> So far we have over 40 reviews for each of the themes we've shared, and while 
> the system for feedback may not be perfect there are already some clear 
> signals in the data about what folks want to hear and what they'd like to 
> pass on. We'll do our best to be transparent about the process moving 
> forward, and have a strong feedback loop with the community to ensure that 
> your voice and opinions are given weight in the decisions of the program 
> committee.
>  
> If you have additional questions, feel free to ask on-list. I think having 
> these discussions in the open is helpful, and also keeps us (the program 
> committee) accountable to the community.
>  
> Dave
>  
>> On Sat, Feb 21, 2015, at 04:41 AM, Steve Domin wrote:
>> All of them look great indeed. Didn't reject any as well. 
>>  
>> I find the wording a bit annoying as I didn't really find any talk 
>> "average", I just selected "accept" for the one I really want to hear about 
>> more than anything else.
>>  
>> On Saturday, February 21, 2015, Pablo Delgado  wrote:
>> Maybe a scale of 1 to 5 makes more sense.
>>  
>> I did "Accept" on ALL talks but 3 that consider "Average" but only on the 
>> basis of repeated subject. Did not reject Anything.
>>  
>> I wonder how the selection is going to be made, since all of them are great 
>> talks!
>>  
>>  
>>  
>> On Fri, Feb 20, 2015 at 9:59 PM, Jie Yu  wrote:
>> +1
>>  
>> It does not allow me to continue to see the second page if I don't make 
>> choices on the first one.
>>  
>>  
>> - Jie
>>  
>>  
>> On Fri, Feb 20, 2015 at 4:54 PM, Benjamin Mahler  
>> wrote:
>> Great to see so many proposals!
>>  
>> Is it intentional that we have to review them in small subsets? It's hard to 
>> tell what to consider as an "Average" proposal when you can only see a small 
>> subset at a time. Just curious on the reasoning behind that.
>>  
>> On Wed, Feb 18, 2015 at 2:44 PM, Dave Lester  wrote:
>> 
>> A total of 63 proposals were submitted for #MesosCon[1], up
>> significantly from 24 submitted for last year’s conference. Similar to
>> last year, the MesosCon program committee is opening these proposals up
>> for community review/feedback to better-inform our decisions about what
>> should be included in the program.
>> 
>> In order to make it easier to review a subset of the proposals, we’ve
>> segmented them based upon three loose themes: Frameworks, Users / Ops,
>> and Mesos Internals and Extensions. We encourage you to review proposals
>> based upon one theme, or all three!
>> 
>> *Frameworks (18 Proposals):* bit.ly/MesosCon2015Frameworks Talks on how
>> frameworks can be used, developed, and integrate with Mesos.
>> 
>> *Users / Ops (28 Proposals):* bit.ly/MesosCon2015UsersOps A combination
>> of talks that are use cases (how company x uses Mesos), and
>> operations-focused (how we deploy x, use Docker, etc).
>> 
>> *Mesos Internals and Extensions (17 Proposals):*
>> bit.ly/MesosCon2015InternalsExt Features of the Mesos core, or software
>> integrations with the internals of Mesos. Some proposals have overlap
>> with frameworks and ops, but most are focused on the foundational
>> aspects of how Mesos works.
>> 
>> The forms above also include an opportunity to indicate which sessions
>> you didn't see proposed but would like to attend.
>> 
>> Thanks in advance for your participation! The forms will close on March
>> 4th 2015, two weeks from today.
>> 
>> Dave
>> 
>> 
>> Links:
>> 
>>   1. http://mesoscon.org
>

Re: Compiling Slave on Windows

2015-02-20 Thread Timothy Chen

Hi Alexandre,

That seems reasonable, for Mesos requirement standpoint a slave minimal needs 
to be able to recieve and send HTTP messages to executor.

There are other wants from a containerizer standpoint such as logging, 
isolation, updating limits, etc. But not everything is a hard requirement.

Tim

> On Feb 20, 2015, at 12:30 PM, Alexandre Mclean  
> wrote:
> 
> Hi Tim,
> if we're operating in a cloud base environment (OpenStack), would that mean 
> we could run this containerizer on the hypervisor host, or anywhere else, and 
> manage remotely the executors inside Windows cloud instances?
> 
>> On Fri, Feb 20, 2015 at 3:14 PM, Tim Chen  wrote:
>> Hi Alexandre,
>> 
>> Porting the slave will not be straightforward since it inherently was 
>> operating with the assumption of a unix based system throughout. 
>> 
>> What is much easier to accomplish, is to provide a containerizer in Mesos 
>> that can manage VMs, which can then spawn Windows based VMs. This does has 
>> higher perf hit than containers ofcourse.
>> 
>> Tim
>> 
>>> On Fri, Feb 20, 2015 at 11:43 AM, Alexandre Mclean 
>>>  wrote:
>>> Hi James,
>>> I agree this is a compelling feature, but it might not be an absolute 
>>> requirement for many use cases.
>>> 
>>> We're evaluating Mesos to build custom frameworks for distributed 
>>> computation that needs to be cross-platform (Windows mainly), where some 
>>> tasks would be oriented for video rendering (e.g render farm, like many 
>>> existing commercial solutions). This is pretty popular in the Entertainment 
>>> industry, like video games.
>>> 
>>> We'd be willing to sacrifice that isolation and just be able to build a job 
>>> distribution framework on top of Mesos.
>>> 
>>> Also, correct me if I'm wrong but Slaves can already run on OSX which has 
>>> no support for cgroups. 
>>> 
>>> I'm curious to know if anyone tried this before and what are the blocking 
>>> issues to port the Slave component.
>>> 
>>> Otherwise, maybe someone can propose an alternative approach to accomplish 
>>> this.
>>> 
>>> Many thanks
>>> 
>>> 
>>> 
>>> 
>>> 
 On Fri, Feb 20, 2015 at 2:26 PM, James DeFelice  
 wrote:
 One of the major, compelling cases for using mesos is the resource 
 partitioning and isolation between process groups that slave 
 containerizers manage. And that, of course, OS containers are lightweight 
 and low-overhead.
 
 Windows has a ways to go here. You can read about Drawbridge, or even the 
 latest speculation re: Docker+Windows Server integration and what that 
 might look like.
 
> On Fri, Feb 20, 2015 at 12:17 AM, Alexandre Mclean 
>  wrote:
> Hi everyone,
> what are the current limitations to make the Slave work on a Windows 
> platform?
> 
> Would it be possible to extract the slave component from the main Mesos 
> codebase and compile it on Windows?
> 
> Also, could we have a pure implementation of the Slave that wouldn't 
> depend on libmesos, like we do for the newest bindings like mesos-go? 
> Does it even make sense to want this?
> 
> -- 
> Alexandre
 
 
 
 -- 
 James DeFelice
 585.241.9488 (voice)
 650.649.6071 (fax)
>>> 
>>> 
>>> 
>>> -- 
>>> Alexandre
> 
> 
> 
> -- 
> Alexandre

Re: [VOTE] Release Apache Mesos 0.21.1 (rc2)

2015-02-17 Thread Timothy Chen

Hi Vinod,

Sorry I planned to do it, forgot to mention in the email.

Tim

On Tue, Feb 17, 2015 at 9:42 AM, Vinod Kone  wrote:
> Tim, mind updating the release guide too?
>
>
> @vinodkone
>
>> On Feb 17, 2015, at 8:19 AM, Timothy Chen  wrote:
>>
>> Hi Ben,
>>
>> Didn't realize I need to update this. I've added the date now.
>>
>> Tim
>>
>>> On Feb 16, 2015, at 10:30 PM, Benjamin Mahler  
>>> wrote:
>>>
>>> Hey Tim,
>>>
>>> Can you release 0.21.1 on JIRA with the correct date?
>>> https://issues.apache.org/jira/plugins/servlet/project-config/MESOS/versions
>>>
>>> Thanks!
>>> Ben
>>>
>>>> On Fri, Jan 2, 2015 at 12:30 PM, Dave Lester  wrote:
>>>> Any of the recent blog posts are good templates, located in the 
>>>> /site/source/blog/ folder on SVN. For example:
>>>>
>>>> https://svn.apache.org/repos/asf/mesos/site/source/blog/2014-11-17-mesos-0-21-0-released.md
>>>>
>>>> or
>>>>
>>>> https://svn.apache.org/repos/asf/mesos/site/source/blog/2014-07-21-mesos-0-19-1-released.md
>>>>
>>>> I'll try to abstract these examples out into a release post template in 
>>>> the coming days. In the meantime, feel free to ping me on IRC if you have 
>>>> any questions.
>>>>
>>>>> On Fri, Jan 2, 2015 at 11:41 AM, Tim Chen  wrote:
>>>>> Hi Dave,
>>>>>
>>>>> Definitely will like your help on this, do you have the previous 
>>>>> release's google doc as an template?
>>>>>
>>>>> I can write one based on that for the this release and post on the list.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Tim
>>>>>
>>>>>> On Fri, Jan 2, 2015 at 11:36 AM, Dave Lester  
>>>>>> wrote:
>>>>>> Hi Tim,
>>>>>>
>>>>>> Will you also be releasing a blog post to announce the release? Let me 
>>>>>> know if you'd like help with the boilerplate text, we usually circulate 
>>>>>> a Google Doc to the list prior to publishing on the site.
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>> On Fri, Jan 2, 2015 at 11:30 AM, Tim Chen  wrote:
>>>>>>> With 5 +1 (including me and till) and no -1, I'll be releasing the 
>>>>>>> tagged version as 0.21.1.
>>>>>>>
>>>>>>> Thanks all!
>>>>>>>
>>>>>>> Tim
>>>>>>>
>>>>>>>> On Tue, Dec 30, 2014 at 4:19 PM, Tom Arnfeld  wrote:
>>>>>>>> +1
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Tom Arnfeld
>>>>>>>> Developer // DueDil
>>>>>>>>
>>>>>>>> (+44) 7525940046
>>>>>>>> 25 Christopher Street, London, EC2A 2BS
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Wed, Dec 31, 2014 at 12:16 AM, Ankur Chauhan  
>>>>>>>>> wrote:
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Sent from my iPhone
>>>>>>>>>
>>>>>>>>>> On Dec 30, 2014, at 16:01, Tim Chen  wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Just a reminder the vote is up for another 2 hours, let me know if 
>>>>>>>>>> any of you have any objections.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Tim
>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 29, 2014 at 5:32 AM, Niklas Nielsen 
>>>>>>>>>>>  wrote:
>>>>>>>>>>> +1, Compiled and tested on Ubuntu Trusty, CentOS Linux 7 and Mac OS 
>>>>>>>>>>> X
>>>>>>>>>>>
>>>>>>>>>>> Thanks guys!
>>>>>>>>>>> Niklas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>&g

Re: [VOTE] Release Apache Mesos 0.21.1 (rc2)

2015-02-17 Thread Timothy Chen

Hi Ben,

Didn't realize I need to update this. I've added the date now.

Tim

> On Feb 16, 2015, at 10:30 PM, Benjamin Mahler  
> wrote:
> 
> Hey Tim,
> 
> Can you release 0.21.1 on JIRA with the correct date?
> https://issues.apache.org/jira/plugins/servlet/project-config/MESOS/versions
> 
> Thanks!
> Ben
> 
>> On Fri, Jan 2, 2015 at 12:30 PM, Dave Lester  wrote:
>> Any of the recent blog posts are good templates, located in the 
>> /site/source/blog/ folder on SVN. For example:
>> 
>> https://svn.apache.org/repos/asf/mesos/site/source/blog/2014-11-17-mesos-0-21-0-released.md
>> 
>> or
>> 
>> https://svn.apache.org/repos/asf/mesos/site/source/blog/2014-07-21-mesos-0-19-1-released.md
>> 
>> I'll try to abstract these examples out into a release post template in the 
>> coming days. In the meantime, feel free to ping me on IRC if you have any 
>> questions.
>> 
>>> On Fri, Jan 2, 2015 at 11:41 AM, Tim Chen  wrote:
>>> Hi Dave,
>>> 
>>> Definitely will like your help on this, do you have the previous release's 
>>> google doc as an template?
>>> 
>>> I can write one based on that for the this release and post on the list.
>>> 
>>> Thanks,
>>> 
>>> Tim
>>> 
 On Fri, Jan 2, 2015 at 11:36 AM, Dave Lester  wrote:
 Hi Tim,
 
 Will you also be releasing a blog post to announce the release? Let me 
 know if you'd like help with the boilerplate text, we usually circulate a 
 Google Doc to the list prior to publishing on the site.
 
 Dave
 
> On Fri, Jan 2, 2015 at 11:30 AM, Tim Chen  wrote:
> With 5 +1 (including me and till) and no -1, I'll be releasing the tagged 
> version as 0.21.1.
> 
> Thanks all!
> 
> Tim
> 
>> On Tue, Dec 30, 2014 at 4:19 PM, Tom Arnfeld  wrote:
>> +1
>> 
>> --
>> 
>> Tom Arnfeld
>> Developer // DueDil
>> 
>> (+44) 7525940046
>> 25 Christopher Street, London, EC2A 2BS
>> 
>> 
>>> On Wed, Dec 31, 2014 at 12:16 AM, Ankur Chauhan  
>>> wrote:
>>> +1
>>> 
>>> Sent from my iPhone
>>> 
 On Dec 30, 2014, at 16:01, Tim Chen  wrote:
 
 Hi all,
 
 Just a reminder the vote is up for another 2 hours, let me know if any 
 of you have any objections.
 
 Thanks,
 
 Tim
 
> On Mon, Dec 29, 2014 at 5:32 AM, Niklas Nielsen 
>  wrote:
> +1, Compiled and tested on Ubuntu Trusty, CentOS Linux 7 and Mac OS X
> 
> Thanks guys!
> Niklas
> 
> 
>> On 19 December 2014 at 22:02, Tim Chen  wrote:
>> Hi Ankur,
>> 
>> Since MESOS-1711 is just a minor improvement I'm inclined to include 
>> it for the next major release which shouldn't be too far away from 
>> this release.
>> 
>> If anyone else thinks otherwise please let me know.
>> 
>> Tim
>> 
>>> On Fri, Dec 19, 2014 at 12:44 PM, Ankur Chauhan 
>>>  wrote:
>>> Sorry for a late join in can we get 
>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/MESOS-1711
>>>  in too or is it too late?
>>> -- ankur 
>>> Sent from my iPhone
>>> 
 On Dec 19, 2014, at 12:23, Tim Chen  wrote:
 
 Hi all,
 
 Please vote on releasing the following candidate as Apache Mesos 
 0.21.1.
 
 
 0.21.1 includes the following:
 
 * This is a bug fix release.
 
 ** Bug
   * [MESOS-2047] Isolator cleanup failures shouldn't cause 
 TASK_LOST.
   * [MESOS-2071] Libprocess generates invalid HTTP
   * [MESOS-2147] Large number of connections slows statistics.json 
 responses.
   * [MESOS-2182] Performance issue in libprocess SocketManager.
 
 ** Improvement
   * [MESOS-1925] Docker kill does not allow containers to exit 
 gracefully
   * [MESOS-2113] Improve configure to find apr and svn 
 libraries/headers in OSX
 
 The CHANGELOG for the release is available at:
 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.21.1-rc2
 
 
 The candidate for Mesos 0.21.1 release is available at:
 https://dist.apache.org/repos/dist/dev/mesos/0.21.1-rc2/mesos-0.21.1.tar.gz
 
 The tag to be voted on is 0.21.1-rc2:
 https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.21.1-rc2
 
 The MD5 checksum of the tarball can be found at:
>>>

Re: Optimal resource allocation

2015-02-05 Thread Timothy Chen

Hi Pradeep,

First of all I think the notion of optimal is not just a single dimension of 
being task duration, but also considering lots of other dimensions such as 
throughput, fairness, latency, SLA and more.

Mesos is a two level scheduler, which means it's not doing all the scheduling 
at a single point (master), but instead cooperate with frameworks to have a 
good scheduling decision.

So Mesos can achieve it with multiple attributes or resources as you mentioned 
with the help of frameworks.

Tim

> On Feb 5, 2015, at 9:09 PM, Dario Rexin  wrote:
> 
> Hi Pradeep,
> 
> I am actually working on a patch for ARM support. I already have Mesos 
> running on ARMv7, just need to polish it a bit and I still have 1 failing 
> test. Expect news about this soon.
> 
> Cheers,
> Dario
> 
>> On Feb 5, 2015, at 1:46 PM, Pradeep Kiruvale  
>> wrote:
>> 
>> Hi Dario,
>> 
>> Thanks for the reply and clarification.
>> 
>>  How hard is to port to ARM? is there lot of architecture related code? Any 
>> idea?
>> 
>> Regards,
>> Pradeep
>> 
>>> On 5 February 2015 at 12:01, Dario Rexin  wrote:
>>> There is currently no support for ARM cpus. GPUs and FPGAs could be added 
>>> to the resources in the future but are also not supported yet. Scheduling 
>>> tasks on machines that have a specific configuration (powerful GPU or sth 
>>> like that) can be done with attributes. There's however no way to isolate 
>>> those resources like we do with CPU and RAM.
>>> 
>>> 
>>> 
>>> > On 05.02.2015, at 11:10, Chengwei Yang  wrote:
>>> >
>>> >> On Thu, Feb 05, 2015 at 12:00:28AM +0100, Pradeep Kiruvale wrote:
>>> >> Hi All,
>>> >>
>>> >> I am new to Mesos and I have heard and read lot about it.
>>> >>
>>> >> I have few doubts regarding the resource allocation by the mesos, please 
>>> >> help
>>> >> me
>>> >> to clarify my doubts.
>>> >>
>>> >> In a data center, if there are thousands of heterogeneous nodes
>>> >> (x86,arm,gpu,fpgas) then is the mesos can really allocate a co-located
>>> >
>>> > First, does mesos can run on arm, gpu, fpga?
>>> >
>>> > Seconds, does your tasks run on all archs?
>>> >
>>> > --
>>> > Thanks,
>>> > Chengwei
>>> >
>>> >> resources for any incoming application to finish the task faster?
>>> >>
>>> >> How these resource constraints are solved? what kind of a constraint 
>>> >> solver it
>>> >> uses?
>>> >>
>>> >> Is the policy maker configurable?
>>> >>
>>> >> Thanks & Regards,
>>> >> Pradeep
>>> >
>

Re: Subscribe

2015-02-02 Thread Timothy Chen

Please email user-subscr...@mesos.apache.org


> On Feb 3, 2015, at 8:47 AM, Kartik Mehta  wrote:
>

Re: *namespaces now in Docker for Mesos slave in a container parity.

2015-01-19 Thread Timothy Chen

This definitely sounds exciting.

Do you happen to have link to more information about this?

Tim

Sent from my iPhone

> On Jan 19, 2015, at 11:18 AM, Tim St Clair  wrote:
> 
> Greetings folks - 
> 
> All of the namespace work "should" be in the next release of Docker(1.5?).  
> This would enable ~feature parity to bare metal on the slave, but it may 
> require some command line magic to enable super privileged containers to 
> behave as expected.
> 
> This means you should be able to enable *namespace features when running 
> mesos-slave from a container. 
> 
> 1 open question I still haven't figured out, is if there is kernel namespace 
> api compatibility across major release versions.  I know it's stable going 
> forwards, but I haven't compared EL6 vs. EL7 kernel api to see if there are 
> changes in namespaces.
> 
> -- 
> Cheers,
> Timothy St. Clair
> Red Hat Inc.

Re: mesos and coreos?

2015-01-18 Thread Timothy Chen

I think CoreOS provides a good single node OS for executing containers and 
Fleet provides very simple scheduling and placement and etcd provides discovery 
primitives. 

I think Mesos besides being ore proven to scale and handle failure scenarios, 
it also provides more primitives for users to write Mesos frameworks that can 
provide more information and events for applications to be smarter about how it 
wants to react to these.

Mesos also provides more isolation choices, more statistics available, and also 
provides a community and existing frameworks that all users can leverage 
already. 

Tim

> On Jan 18, 2015, at 2:27 PM, Victor L  wrote:
> 
> Does that mean mesos is framework to prepare my app to take advantage of 
> clustering environment? 
> 
>> On Sun, Jan 18, 2015 at 1:43 PM, Tom Arnfeld  wrote:
>> The way I see it, Mesos is an API and framework for building and running 
>> distributed systems. CoreOS is an API and framework for running them.
>> 
>> --
>> 
>> Tom Arnfeld
>> Developer // DueDil
>> 
>> (+44) 7525940046
>> 25 Christopher Street, London, EC2A 2BS
>> 
>> 
>>> On Sun, Jan 18, 2015 at 3:01 PM, Jason Giedymin  
>>> wrote:
>>> The value of coreos that immediately comes to mind since I do much work 
>>> with these tools: 
>>> 
>>> - the small foot print, it is a minimal os, meant to run containers. So it 
>>> throws everything not needed for that out. 
>>> - containers are the launch vehicle, thus deps are in container land. I can 
>>> run and test containers with ease, not having to worry about multiple OSes. 
>>> - with etcd and fleet, coordinating the launch and modification of both 
>>> machines and cluster make it a breeze. Allowing you to do dynamic mesos 
>>> scaling up or down. I add nodes at will, across multiple cloud platforms, 
>>> ready to launch multitude of containers or just mesos. 
>>> - security. There is a defined write strategy. You cannot write willy nilly 
>>> to any location. 
>>> - all the above further allow auto OS updates, which is supported today on 
>>> all platforms that deploy coreos. This means more frequent updates since 
>>> the os is minimal, which should increase the security effectiveness when 
>>> compared to big box superstore OSes like Redhat or Ubuntu. Some platforms 
>>> charge quite a bit for managed updates of this frequency and level of 
>>> testing. 
>>> 
>>> Coreos allows me to keep apps in a configured container that I trust, 
>>> tested, and works time and time again. 
>>> 
>>> I see coreos as a compliment. 
>>> 
>>> As a fyi I'm available for questions, debugging, and client work in this 
>>> area. 
>>> 
>>> Hope this helps some, from real world usage. 
>>> 
>>> Sent from my iPad 
>>> 
>>> > On Jan 18, 2015, at 9:16 AM, Victor L  wrote: 
>>> > 
>>> > I am confused: what's the value of mesos on the top of coreos cluster? 
>>> > Mesos provides distributed resource management, fault tolerance, etc., 
>>> > but doesn't coreos provides the same things already? 
>>> > Thanks
>

Re: CockroachDB

2015-01-07 Thread Timothy Chen

Hi Darren,

I've looking into Cockroachdb as well. 

Currently Cockroachdb isn't a fully distributed database yet, and only works on 
single mode as far as I know. I know that's going to change soon.

Due to this I don't think anyone is running it on Mesos. I plan to play with it 
on Mesos once it's more ready.

Tim

> On Jan 7, 2015, at 8:17 PM, Darren Haas  wrote:
> 
> Hi All,
> 
> Anyone playing with CockroachDB on Mesos? (Non Docker for now) 
> 
> I have been looking at geo-replicated datastore solutions for Mesos, and have 
> been digging into to CockroachDB.
> 
> Thanks,
> Darren

Re: Rocket

2014-12-03 Thread Timothy Chen

Hi Tim,

Definitely I agree, i think what I am getting at is that it's clear from the 
conversation that a open governance is what they want from day one. Apache is 
one of the options mentioned one the Issue, and I believe something along that 
line is most probable.

As long as that's true it won't be as difficult as other options to maintain  
as an containerizer option for us.

Tim

> On Dec 3, 2014, at 9:42 AM, Tim St Clair  wrote:
> 
> inline below
> 
> From: "Tim Chen" 
> To: user@mesos.apache.org
> Cc: "dev" 
> Sent: Wednesday, December 3, 2014 11:20:47 AM
> Subject: Re: Rocket
> 
> Hi Tim,
> 
> I see you've already commented on the rocket repo about this, and from their 
> messaging it aims to be independent which should be the whole point of the 
> open container spec.
> I'm all over this like white on rice. 
> 
> I think the best way is just to be involved in the spec early on and continue 
> to do so while we move forward, and we have relationships with the rocket 
> people which should help also being in the loop as well.
> Relationships alone won't cut it.  
> Friends one day, enemies the next, isn't that the way it worked with 
> Docker...?
> 
> Governance, such as Apaches model, is of critical importance.
> 
> Tim
> 
>> On Wed, Dec 3, 2014 at 8:26 AM, Tim St Clair  wrote:
>> 
>> Not to put too fine a point on it, but how are folks planning on 
>> establishing governance around the App Container spec?
>> 
>> https://github.com/coreos/rocket/issues/193
>> 
>> If the mesos community decides to leverage our own, how do we ensure that we 
>> have say in the spec going forwards?
>> 
>> Cheers,
>> Tim
>> 
>> - Original Message -
>> > From: "Tobias Knaup" 
>> > To: user@mesos.apache.org
>> > Cc: "dev" 
>> > Sent: Monday, December 1, 2014 11:39:58 PM
>> > Subject: Re: Rocket
>> >
>> > An important point to clarify is that two things were announced: a spec
>> > (App Container) and an implementation (Rocket).
>> > Here is the spec:
>> > https://github.com/coreos/rocket/blob/master/app-container/SPEC.md
>> > This separation of spec and implementation is important. It makes it much
>> > easier to integrate in Mesos. systemd is also just the implementation of
>> > the runtime part of the spec that CoreOS chose for Rocket. Mesos can use
>> > something else or come with its own.
>> >
>> >
>> > On Mon, Dec 1, 2014 at 1:29 PM, Dominic Hamon 
>> > wrote:
>> >
>> > > Instead of considering the Rocket runtime as implemented, we should
>> > > instead consider how we can implement their specification. A community is
>> > > always healthier when there are multiple implementations of a
>> > > specification, and through implementing it we may find ways to improve 
>> > > it.
>> > >
>> > > Also, this allows us to be a strong voice in the community and provide
>> > > value through a C++ implementation.
>> > >
>> > > I've created a JIRA ticket
>> > > https://issues.apache.org/jira/browse/MESOS-2162 to track any thoughts on
>> > > this.
>> > >
>> > > On Mon, Dec 1, 2014 at 11:10 AM, Tim Chen  wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> Per the announcement from CoreOS about Rocket (
>> > >> https://coreos.com/blog/rocket/) , it seems to be an exciting
>> > >> containerizer runtime that has composable isolation/components, better
>> > >> security and image specification/distribution.
>> > >>
>> > >> All of these design goals also fits very well into Mesos, where in Mesos
>> > >> we also have a pluggable isolators model and have been experiencing some
>> > >> pain points with our existing containerizers around image distribution 
>> > >> and
>> > >> security as well.
>> > >>
>> > >> I'd like to propose to integrate Rocket into Mesos with a new Rocket
>> > >> containerizer, where I can see we can potentially integrate our existing
>> > >> isolators into Rocket runtime.
>> > >>
>> > >> Like to learn what you all think,
>> > >>
>> > >> Thanks!
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Dominic Hamon | @mrdo | Twitter
>> > > *There are no bad ideas; only good ideas that go horribly wrong.*
>> > >
>> >
>> 
>> --
>> Cheers,
>> Timothy St. Clair
>> Red Hat Inc.
> 
> 
> 
> 
> -- 
> Cheers,
> Timothy St. Clair
> Red Hat Inc.

Re: Rocket

2014-12-01 Thread Timothy Chen

Thanks Tobias for clarifying this, we can consider implement and help
shape the spec that is easy for Mesos to integrate.

Tim

On Mon, Dec 1, 2014 at 9:39 PM, Tobias Knaup  wrote:
> An important point to clarify is that two things were announced: a spec
> (App Container) and an implementation (Rocket).
> Here is the spec:
> https://github.com/coreos/rocket/blob/master/app-container/SPEC.md
> This separation of spec and implementation is important. It makes it much
> easier to integrate in Mesos. systemd is also just the implementation of
> the runtime part of the spec that CoreOS chose for Rocket. Mesos can use
> something else or come with its own.
>
>
> On Mon, Dec 1, 2014 at 1:29 PM, Dominic Hamon 
> wrote:
>
>> Instead of considering the Rocket runtime as implemented, we should
>> instead consider how we can implement their specification. A community is
>> always healthier when there are multiple implementations of a
>> specification, and through implementing it we may find ways to improve it.
>>
>> Also, this allows us to be a strong voice in the community and provide
>> value through a C++ implementation.
>>
>> I've created a JIRA ticket
>> https://issues.apache.org/jira/browse/MESOS-2162 to track any thoughts on
>> this.
>>
>> On Mon, Dec 1, 2014 at 11:10 AM, Tim Chen  wrote:
>>
>>> Hi all,
>>>
>>> Per the announcement from CoreOS about Rocket (
>>> https://coreos.com/blog/rocket/) , it seems to be an exciting
>>> containerizer runtime that has composable isolation/components, better
>>> security and image specification/distribution.
>>>
>>> All of these design goals also fits very well into Mesos, where in Mesos
>>> we also have a pluggable isolators model and have been experiencing some
>>> pain points with our existing containerizers around image distribution and
>>> security as well.
>>>
>>> I'd like to propose to integrate Rocket into Mesos with a new Rocket
>>> containerizer, where I can see we can potentially integrate our existing
>>> isolators into Rocket runtime.
>>>
>>> Like to learn what you all think,
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Dominic Hamon | @mrdo | Twitter
>> *There are no bad ideas; only good ideas that go horribly wrong.*
>>

Re: Help needed

2014-11-22 Thread Timothy Chen

Hi Qiang,

It depends on how your NFS is setup, but if you have it mounted at the same 
location on each slave you simply just map that volume into your docker 
container with Mesos.

Tim

Sent from my iPhone

> On Nov 22, 2014, at 9:40 AM, Qiang  wrote:
> 
> I having been working with docker and mesos recently and one of the app I am 
> going to dockerize relies on file storage, I thought about using NFS, and 
> docker data volume container, but I don't know how can I possibly use these 
> to address my problem, as far as I know, mesos has service discovery but in 
> my case I don't think a file storage can be made a service somehow.
> 
> Any idea to save my day?
> 
> Thanks,
> 
> -- 
> Qiang Han

Re: A problem with resource offers

2014-11-06 Thread Timothy Chen

Hi Sharma,

Can you try out the latest master and see if you can repro it?

Tim

Sent from my iPhone

> On Nov 6, 2014, at 7:41 PM, Sharma Podila  wrote:
> 
> 
> I am on 0.18 still.
> 
> I think I found a bug. I wrote a simple program to repeat this and there's a 
> new twist as well.
> 
> Again, although I have fixed this for now in my framework by removing all 
> previous leases after re-registration, this can show up when mesos starts 
> rescinding offers in the future.
> 
> Here's what I do:
> 
> 1. register with mesos that has just one slave in the cluster and only one 
> master
> 2. get an offer, O1
> 3. kill and restart mesos master
> 4. get new offer for the only slave, O2
> 5. launch a task with both offers O1 and O2
> 6. receive TASK_LOST 
> 7. wait for new offer, that never comes.
> Here's the new twist:
> 8. kill my framework and restart
> 9. get no offers from mesos at all.
> 
> Here's the relevant mesos master logs:
> 
> I1106 19:31:55.734485 10423 master.cpp:770] Elected as the leading master!
> I1106 19:31:55.737759 10423 master.cpp:1936] Attempting to re-register slave 
> 20141029-125131-16842879-5050-18827-1 at slave(1)@127.0.1.1:5051 
> (lgud-spodila2)
> I1106 19:31:55.737788 10423 master.cpp:2818] Adding slave 
> 20141029-125131-16842879-5050-18827-1 at lgud-spodila2 with cpus(*):8; 
> mem(*):39209; disk(*):219127; ports(*):[31000-32000]
> I1106 19:31:55.738088 10422 hierarchical_allocator_process.hpp:445] Added 
> slave 20141029-125131-16842879-5050-18827-1 (lgud-spodila2) with cpus(*):8; 
> mem(*):39209; disk(*):219127; ports(*):[31000-32000] (and cpus(*):8; 
> mem(*):39209; disk(*):219127; ports(*):[31000-32000] available)
> I1106 19:31:56.733850 10423 master.cpp:906] Re-registering framework 
> 20141106-193136-16842879-5050-10308- at scheduler(1)@127.0.1.1:55515
> I1106 19:31:56.734544 10424 hierarchical_allocator_process.hpp:332] Added 
> framework 20141106-193136-16842879-5050-10308-
> I1106 19:31:56.735044 10424 master.cpp:2285] Sending 1 offers to framework 
> 20141106-193136-16842879-5050-10308-
> I1106 19:31:59.627913 10423 http.cpp:391] HTTP request for 
> '/master/state.json'
> I1106 19:32:09.634088 10421 http.cpp:391] HTTP request for 
> '/master/state.json'
> W1106 19:32:10.377226 10425 master.cpp:1556] Failed to validate offer  : 
> Offer 20141106-193136-16842879-5050-10308-0 is no longer valid
> I1106 19:32:10.378697 10425 master.cpp:1567] Sending status update TASK_LOST 
> (UUID: afadf504-f606-47f2-82cc-5af2e532afcd) for task Job123 of framework 
> 20141106-193136-16842879-5050-10308- for launch task attempt on invalid 
> offers: [ 20141106-193147-16842879-5050-10406-0, 
> 20141106-193136-16842879-5050-10308-0 ]
> 
> 
> Master thinks both offers are invalid and basically leaks it. 
> 
> I1106 19:32:19.640913 10422 http.cpp:391] HTTP request for 
> '/master/state.json'
> I1106 19:32:22.667037 10424 master.cpp:595] Framework 
> 20141106-193136-16842879-5050-10308- disconnected
> I1106 19:32:22.667280 10424 master.cpp:1079] Deactivating framework 
> 20141106-193136-16842879-5050-10308-
> I1106 19:32:22.668009 10424 master.cpp:617] Giving framework 
> 20141106-193136-16842879-5050-10308- 0ns to failover
> I1106 19:32:22.668124 10427 hierarchical_allocator_process.hpp:408] 
> Deactivated framework 20141106-193136-16842879-5050-10308-
> I1106 19:32:22.668252 10425 master.cpp:2201] Framework failover timeout, 
> removing framework 20141106-193136-16842879-5050-10308-
> I1106 19:32:22.668443 10425 master.cpp:2688] Removing framework 
> 20141106-193136-16842879-5050-10308-
> I1106 19:32:22.668829 10425 hierarchical_allocator_process.hpp:363] Removed 
> framework 20141106-193136-16842879-5050-10308-
> I1106 19:32:24.739157 10426 master.cpp:818] Received registration request 
> from scheduler(1)@127.0.1.1:37122
> I1106 19:32:24.739328 10426 master.cpp:836] Registering framework 
> 20141106-193147-16842879-5050-10406- at scheduler(1)@127.0.1.1:37122
> I1106 19:32:24.739753 10426 hierarchical_allocator_process.hpp:332] Added 
> framework 20141106-193147-16842879-5050-10406-
> I1106 19:32:29.647886 10423 http.cpp:391] HTTP request for 
> '/master/state.json'
> 
> 
>> On Thu, Nov 6, 2014 at 6:53 PM, Benjamin Mahler  
>> wrote:
>> Which version of the master are you using and do you have the logs? The fact 
>> that no offers were coming back sounds like a bug!
>> 
>> As for using O1 after a disconnection, all offers are invalid once a 
>> disconnection occurs. The scheduler driver does not automatically rescind 
>> offers upon disconnection, so I'd recommend clearing all cached offers when 
>> your scheduler gets disconnected, to avoid the unnecessary TASK_LOST updates.
>> 
>>> On Thu, Nov 6, 2014 at 6:25 PM, Sharma Podila  wrote:
>>> We had an interesting problem with resource offers today and I would like 
>>> to confirm this problem and request an enhancement. Here's

Re: Why rely on url scheme for fetching?

2014-11-03 Thread Timothy Chen

I think it's too late to be included, since it's going to take some
rounds of review, and Ian is cutting the release today.

We'll have to tag this for the next release.

Tim

On Mon, Nov 3, 2014 at 10:22 AM, Ankur Chauhan  wrote:
> Hi Tim/others,
>
> Is this to be included in the 0.21.0 release? If so, I don't know how to tag 
> it etc. I would really (shamelessly) love it to be included as it would 
> really simplify my intended usecase of using snackfs (cassandra backed 
> filesystem).
>
> -- Ankur
>
>> On 3 Nov 2014, at 09:28, Ankur Chauhan  wrote:
>>
>> Yea, I saw those today morning. I'll hold off a little mesos-336 changes a 
>> lot of stuff.
>>
>> Sent from my iPhone
>>
>> On Nov 3, 2014, at 9:18 AM, Adam Bordelon > <mailto:a...@mesosphere.io>> wrote:
>>
>>> + Bernd, who has done some fetcher work, including additional testing, for 
>>> MESOS-1316, MESOS-1945, and MESOS-336
>>>
>>> On Mon, Nov 3, 2014 at 9:04 AM, Dominic Hamon >> <mailto:dha...@twopensource.com>> wrote:
>>> Hi Ankur
>>>
>>> I think this is a great approach. It makes the code much simpler, 
>>> extensible, and more testable. Anyone that's heard me rant knows I am a big 
>>> fan of unit tests over integration tests, so this shouldn't surprise anyone 
>>> :)
>>>
>>> If you haven't already, please read the documentation on contributing to 
>>> Mesos and the style guide to ensure all the naming is as expected, then you 
>>> can push the patch to reviewboard to get it reviewed and committed.
>>>
>>> On Mon, Nov 3, 2014 at 12:49 AM, Ankur Chauhan >> <mailto:an...@malloc64.com>> wrote:
>>> Hi,
>>>
>>> I did some learning today! This is pretty much a very rough draft of the 
>>> tests/refactor of mesos-fetcher that I have come up with. Again, If there 
>>> are some obvious mistakes, please let me know. (this is my first pass after 
>>> all).
>>> https://github.com/ankurcha/mesos/compare/prefer_2 
>>> <https://github.com/ankurcha/mesos/compare/prefer_2>
>>>
>>> My main intention is to break the logic of the fetcher info some very 
>>> discrete components that i can write tests against. I am still re-learning 
>>> cpp/mesos code styles etc so I may be a little slow to catch up but I would 
>>> really appreciate any comments and/or suggestions.
>>>
>>> -- Ankur
>>> @ankurcha
>>>
>>>> On 2 Nov 2014, at 18:17, Ankur Chauhan >>> <mailto:an...@malloc64.com>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I noticed that the current set of tests in `src/tests/fetcher_tests.cpp` 
>>>> is pretty coarse grained and are more on the lines of a functional test. I 
>>>> was going to add some tests but it seems like if I am to do that I would 
>>>> need to add a test dependency on hadoop.
>>>>
>>>> As an alternative, I propose adding a good set of unit tests around the 
>>>> methods used by `src/launcher/fetcher.cpp` and `src/hdfs/hdfs.cpp`. This 
>>>> should be able to catch a good portion of cases at the same time keeping 
>>>> the dependencies and runtime of tests low. What do you guys thing about 
>>>> this?
>>>>
>>>> PS: I am pretty green in terms of gtest and the overall c++ testing 
>>>> methodology. Can someone give me pointers to good examples of tests in the 
>>>> codebase.
>>>>
>>>> -- Ankur
>>>>
>>>>> On 1 Nov 2014, at 22:54, Adam Bordelon >>>> <mailto:a...@mesosphere.io>> wrote:
>>>>>
>>>>> Thank you Ankur. At first glance, it looks great. We'll do a more 
>>>>> thorough review of it very soon.
>>>>> I know Tim St. Clair had some ideas for fixing MESOS-1711 
>>>>> <https://issues.apache.org/jira/browse/MESOS-1711>; he may want to review 
>>>>> too.
>>>>>
>>>>> On Sat, Nov 1, 2014 at 8:49 PM, Ankur Chauhan >>>> <mailto:an...@malloc64.com>> wrote:
>>>>> Hi Tim,
>>>>>
>>>>> I just created a review https://reviews.apache.org/r/27483/ 
>>>>> <https://reviews.apache.org/r/27483/> It's my first stab at it and I will 
>>>>> try to add more tests as I figure out how to do the hadoop mocking and 
>>>>> stuff. Have a look and let me know what you think about it

Re: Why rely on url scheme for fetching?

2014-11-01 Thread Timothy Chen

Hi Ankur,

Can you post on reviewboard? We can discuss more about the code there.

Tim

Sent from my iPhone

> On Nov 1, 2014, at 6:29 PM, Ankur Chauhan  wrote:
> 
> Hi Tim,
> 
> I don't think there is an issue which is directly in line with what i wanted 
> but the closest one that I could find in JIRA is 
> https://issues.apache.org/jira/browse/MESOS-1711
> 
> I have a branch ( 
> https://github.com/ankurcha/mesos/compare/prefer_hadoop_fetcher ) that has a 
> change that would enable users to specify whatever hdfs compatible uris to 
> the mesos-fetcher but maybe you can weight in on it. Do you think this is the 
> right track? if so, i would like to pick this issue and submit a patch for 
> review.
> 
> -- Ankur
> 
> 
>> On 1 Nov 2014, at 04:32, Tom Arnfeld  wrote:
>> 
>> Completely +1 to this. There are now quite a lot of hadoop compatible 
>> filesystem wrappers out in the wild and this would certainly be very useful.
>> 
>> I'm happy to contribute a patch. Here's a few related issues that might be 
>> of interest;
>> 
>> - https://issues.apache.org/jira/browse/MESOS-1887
>> - https://issues.apache.org/jira/browse/MESOS-1316
>> - https://issues.apache.org/jira/browse/MESOS-336
>> - https://issues.apache.org/jira/browse/MESOS-1248
>> 
>>> On 31 October 2014 22:39, Tim Chen  wrote:
>>> I believe there is already a JIRA ticket for this, if you search for 
>>> fetcher in Mesos JIRA I think you can find it.
>>> 
>>> Tim
>>> 
 On Fri, Oct 31, 2014 at 3:27 PM, Ankur Chauhan  wrote:
 Hi,

 I have been looking at some of the stuff around the fetcher and saw 
 something interesting. The code for fetcher::fetch method is dependent on 
 a hard coded list of url schemes. No doubt that this works but is very 
 restrictive.
 Hadoop/HDFS in general is pretty flexible when it comes to being able to 
 fetch stuff from urls and has the ability to fetch a large number of types 
 of urls and can be extended by adding configuration into the 
 conf/hdfs-site.xml and core-site.xml

 What I am proposing is that we refactor the fetcher.cpp to prefer to use 
 the hdfs (using hdfs/hdfs.hpp) to do all the fetching if HADOOP_HOME is 
 set and $HADOOP_HOME/bin/hadoop is available. This logic already exists 
 and we can just use it. The fallback logic for using net::download or 
 local file copy is may be left in place for installations that do not have 
 hadoop configured. This means that if hadoop is present we can directly 
 fetch urls such as tachyon://... snackfs:// ... cfs://  ftp://... 
 s3://... http:// ... file:// with no extra effort. This makes up for a 
 much better experience when it comes to debugging and extensibility.

 What do others think about this?

 - Ankur
>

Re: build failure on mac os x

2014-11-01 Thread Timothy Chen

Hi Ankur,

Where does your lib subversion header live? Configure must have fonnd it but 
since we include the header without the path the path name may not match.

Tim

Sent from my iPhone

> On Nov 1, 2014, at 8:54 AM, Dominic Hamon  wrote:
> 
> libsubversion was added as a dependency recently. However it's lack should 
> have been caught by the configure step.
> 
> I don't know enough about osx and libsubversion to know if there's something 
> special you need to do.
> 
>> On Nov 1, 2014 2:05 AM, "Ankur Chauhan"  wrote:
>> I am trying to build the latest (master) mesos source and keep getting this 
>> error:
>> 
>> In file included from ../../src/state/log.cpp:25:0:
>> ../../3rdparty/libprocess/3rdparty/stout/include/stout/svn.hpp:21:23: fatal 
>> error: svn_delta.h: No such file or directory
>>  #include 
>>^
>> compilation terminated.
>> make[2]: *** [state/libstate_la-log.lo] Error 1
>> make[2]: *** Waiting for unfinished jobs
>> mv -f log/.deps/liblog_la-recover.Tpo log/.deps/liblog_la-recover.Plo
>> mv -f state/.deps/libstate_la-in_memory.Tpo 
>> state/.deps/libstate_la-in_memory.Plo
>> libtool: compile:  g++-4.9 -DPACKAGE_NAME=\"mesos\" 
>> -DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"0.21.0\" 
>> "-DPACKAGE_STRING=\"mesos 0.21.0\"" -DPACKAGE_BUGREPORT=\"\" 
>> -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"0.21.0\" -DSTDC_HEADERS=1 
>> -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
>> -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
>> -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
>> -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBSVN_SUBR_1=1 
>> -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -DMESOS_HAS_JAVA=1 -I. -I../../src 
>> -Wall -Werror -DLIBDIR=\"/usr/local/lib\" 
>> -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" 
>> -DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include 
>> -I../../3rdparty/libprocess/include 
>> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
>> -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
>> -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
>> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
>> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
>> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
>> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
>> -I../3rdparty/zookeeper-3.4.5/src/c/generated 
>> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
>> -I/usr/include/subversion-1 -I/usr/include/apr-1 -I/usr/include/apr-1.0 
>> -D_THREAD_SAFE -g1 -O0 -Wno-unused-local-typedefs -std=c++11 
>> -DGTEST_USE_OWN_TR1_TUPLE=1 -MT state/libstate_la-leveldb.lo -MD -MP -MF 
>> state/.deps/libstate_la-leveldb.Tpo -c ../../src/state/leveldb.cpp -o 
>> state/libstate_la-leveldb.o >/dev/null 2>&1
>> mv -f state/.deps/libstate_la-leveldb.Tpo state/.deps/libstate_la-leveldb.Plo
>> make[1]: *** [all] Error 2
>> make: *** [all-recursive] Error 1
>> 
>> 
>> Any idea why this is happening?
>> 
>> 
>> -- Ankur

Re: args for Docker run surrounded by quotes

2014-10-29 Thread Timothy Chen

Hi Andrew,

By default shell is enabled, which wraps your command in bin/sh and single 
quotes.

Try passing shell false to marathon.

Tim

Sent from my iPhone

> On Oct 29, 2014, at 4:44 AM, Andrew Jones  
> wrote:
> 
> Hi,
> 
> I'm trying to run a Docker image which has a defined entrypoint and pass
> args to it. It looks like when the args are passed to docker run, they
> are surrounded by single quotes.
> 
> The image I am trying to run is tomaskral/chronos, and this is the
> configuration I am giving to Marathon:
> 
> {
>  "id": "chronos-test-2", 
>  "container": {
>"docker": {
>  "image": "tomaskral/chronos",
>  "network": "BRIDGE",
>  "portMappings": [
>{
>  "containerPort": 8080,
>  "hostPort": 0,
>  "servicePort": 31000,
>  "protocol": "tcp"
>}
>  ]
>},
>"type": "DOCKER",
>"volumes": []
>  },
>  "ports":[31000],
>  "args": ["--master zk://...:2181/mesos --zk_hosts zk:/...:2181"],
>  "cpus": 0.2,
>  "mem": 256.0,
>  "instances": 1
> }
> 
> And this is an extract from the log from Mesos when the image is ran:
> 
> + logged chronos run_jar '--master zk://...:2181/mesos --zk_hosts
> zk://...:2181'
> 
> The argument has single quotes around it. run_jar is calling java, which
> cannot handle the quotes, and the process isn't starting.
> 
> If I run the image locally with docker run like this, it works:
> 
> docker run -p 8080:8080 tomaskral/chronos --master zk://...:2181/mesos
> --zk_hosts zk://...:2181
> 
> But adding quotes, like this, and I get the same output as I did from
> Mesos:
> 
> docker run -p 8080:8080 tomaskral/chronos '--master zk://...:2181/mesos
> --zk_hosts zk://...:2181'
> 
> So I think these quotes are being added by either Marathon or Mesos when
> calling docker run, which the java command inside the container can't
> handle.
> 
> Is it Mesos or Marathon adding the quotes? Is this something that should
> be fixed, or should the docker images expect this and cope?
> 
> This is Mesos 0.21.1 and Marathon 0.7.3. I have also asked the author of
> the image for help (https://github.com/kadel/Dockerfiles/issues/3).
> 
> Thanks,
> Andrew

Re: 0.21.0-pre & Spark latest

2014-10-28 Thread Timothy Chen

Hi RJ,

I see, are you or the team on working on this problem already? If not
I'd like to take a look as well.

Tim

On Tue, Oct 28, 2014 at 8:47 AM, RJ Nowling  wrote:
> Hi Tim,
>
> The integration test is simply to open the spark shell (Spark 1.1.0) using 
> mesos 0.21 in coarse-grained mode.  We didn't even have to run any commands.
>
> RJ
>
> - Original Message -
>> From: "Timothy Chen" 
>> To: d...@mesos.apache.org
>> Cc: user@mesos.apache.org, "RJ Nowling" , "Erik 
>> Erlandson" 
>> Sent: Tuesday, October 28, 2014 11:40:19 AM
>> Subject: Re: 0.21.0-pre & Spark latest
>>
>> Hi Tim,
>>
>> Thanks for doing the integration tests, that's something that I wanted to do
>> but never got to yet.
>>
>> I have great interest ensuring spark and mesos work, and I know Brenden as
>> well does.
>>
>> I have been tracking these spark mesos problems with spark jira and labeling
>> them mesos. Can you create these bugs on jira and we can dig more on each
>> one?
>>
>> Also is this integration test automated?
>>
>> Thanks!
>>
>> Tim
>>
>> > On Oct 28, 2014, at 8:03 AM, Tim St Clair  wrote:
>> >
>> > inline
>> >
>> > - Original Message -
>> >
>> >> From: "Brenden Matthews" 
>> >> To: user@mesos.apache.org
>> >> Cc: "mesos-devel" , "RJ Nowling"
>> >> ,
>> >> "Erik Erlandson" 
>> >> Sent: Tuesday, October 28, 2014 9:51:58 AM
>> >> Subject: Re: 0.21.0-pre & Spark latest
>> >
>> >> Since we've recently adopted Spark, I'll second Tim's comment. We had an
>> >> issue with 0.20.1 that was possibly related to Spark[1], so it's important
>> >> for us to get this stuff fixed in 0.21.0.
>> >
>> >> Tim, can you elaborate on the issues you saw? Have you tested with my
>> >> recent
>> >> Spark patches[2][3]?
>> >
>> > We are building against Spark 1.1.0 unpatched:
>> > - Fine grained mode appears broken.
>> >
>> > - Course grained mode appears to work via normal runs, but crashes in the
>> > REPL.
>> > http://fpaste.org/145782/14506564/
>> >
>> >> [1]: https://issues.apache.org/jira/browse/MESOS-1973
>> >> [2]: https://github.com/apache/spark/pull/2401
>> >> [3]: https://github.com/apache/spark/pull/2453
>> >
>> >> On Tue, Oct 28, 2014 at 7:46 AM, Tim St Clair < tstcl...@redhat.com >
>> >> wrote:
>> >
>> >>> Folks -
>> >
>> >>> We have some automated tests that run the latest Mesos against the latest
>> >>> Spark, and we've run across a series of issues in both fine, and course
>> >>> grained mode that I believe stem from a series of changes in the 0.21
>> >>> cycle.
>> >
>> >>> I'm not certain if anyone "owns" this integration, but we should probably
>> >>> ensure it's fixed before we push out 0.21.
>> >
>> >>> --
>> >>
>> >>> Cheers,
>> >>
>> >>> Timothy St. Clair
>> >>
>> >>> Red Hat Inc.
>> >
>> > --
>> > Cheers,
>> > Timothy St. Clair
>> > Red Hat Inc.
>>

Re: 0.21.0-pre & Spark latest

2014-10-28 Thread Timothy Chen

Hi Tim,

Thanks for doing the integration tests, that's something that I wanted to do 
but never got to yet.

I have great interest ensuring spark and mesos work, and I know Brenden as well 
does.

I have been tracking these spark mesos problems with spark jira and labeling 
them mesos. Can you create these bugs on jira and we can dig more on each one?

Also is this integration test automated? 

Thanks!

Tim

> On Oct 28, 2014, at 8:03 AM, Tim St Clair  wrote:
> 
> inline 
> 
> - Original Message -
> 
>> From: "Brenden Matthews" 
>> To: user@mesos.apache.org
>> Cc: "mesos-devel" , "RJ Nowling" 
>> ,
>> "Erik Erlandson" 
>> Sent: Tuesday, October 28, 2014 9:51:58 AM
>> Subject: Re: 0.21.0-pre & Spark latest
> 
>> Since we've recently adopted Spark, I'll second Tim's comment. We had an
>> issue with 0.20.1 that was possibly related to Spark[1], so it's important
>> for us to get this stuff fixed in 0.21.0.
> 
>> Tim, can you elaborate on the issues you saw? Have you tested with my recent
>> Spark patches[2][3]?
> 
> We are building against Spark 1.1.0 unpatched: 
> - Fine grained mode appears broken. 
> 
> - Course grained mode appears to work via normal runs, but crashes in the 
> REPL. 
> http://fpaste.org/145782/14506564/
> 
>> [1]: https://issues.apache.org/jira/browse/MESOS-1973
>> [2]: https://github.com/apache/spark/pull/2401
>> [3]: https://github.com/apache/spark/pull/2453
> 
>> On Tue, Oct 28, 2014 at 7:46 AM, Tim St Clair < tstcl...@redhat.com > wrote:
> 
>>> Folks -
> 
>>> We have some automated tests that run the latest Mesos against the latest
>>> Spark, and we've run across a series of issues in both fine, and course
>>> grained mode that I believe stem from a series of changes in the 0.21
>>> cycle.
> 
>>> I'm not certain if anyone "owns" this integration, but we should probably
>>> ensure it's fixed before we push out 0.21.
> 
>>> --
>> 
>>> Cheers,
>> 
>>> Timothy St. Clair
>> 
>>> Red Hat Inc.
> 
> -- 
> Cheers, 
> Timothy St. Clair 
> Red Hat Inc.

Re: Mesos 0.20.1 still using -net=host when launching Docker containers

2014-10-01 Thread Timothy Chen

Sorry for the documentation, I didn't update the docs as part of 0.20.1 which 
we added the network modes.

Please if you like submit a patch and I can help submit it.

Tim

> On Oct 1, 2014, at 2:25 PM, Tim St Clair  wrote:
> 
> inline below - 
> From: "Andy Grove" 
> To: user@mesos.apache.org
> Sent: Wednesday, October 1, 2014 2:47:54 PM
> Subject: Mesos 0.20.1 still using -net=host when launching Docker containers
> 
> Hi,
> 
> I'm making better progress but have run into another issue that I need help 
> tracking down.
> 
> I've actually packaged up my code in a github repo now and will be writing up 
> a tutorial on this once I have everything working. 
> 
> https://github.com/codefutures/mesos-docker-tutorial
> 
> The README.md contains instructions on running the framework.
> 
> I can use this to start a single instance of the fedora/apache container but 
> when I try and run multiple instances the first one works but the other 
> containers start and then fail pretty quickly. 
> 
> I tracked down the error information in the sandbox and the other containers 
> are failing with "cannot bind to port 80" so it looks like the containers are 
> being launched with host networking (-net=host). I thought this was one of 
> the issues that was fixed in 0.20.1. Do I have to do something to enable 
> containerized networking?
> It's still HOST by default, you'll need to specify network=BRIDGE in the 
> DockerInfo iirc. 
> 
> Cheers,
> Tim
> 
> Thanks,
> 
> Andy.
> 
> --
> Andy Grove
> VP Engineering
> CodeFutures Corporation
> 
> 
> 
> 
> 
> -- 
> Cheers,
> Timothy St. Clair
> Red Hat Inc.

Re: Limit on number of simultaneous Spark frameworks on Mesos?

2014-08-20 Thread Timothy Chen

Can you share your spark / mesos configurations and the spark job? I'd like to 
repro it.

Tim

> On Aug 20, 2014, at 12:39 PM, Cody Koeninger  wrote:
> 
> I'm seeing situations where starting e.g. a 4th spark job on Mesos results in 
> none of the jobs making progress.  This happens even with  --executor-memory 
> set to values that should not come close to exceeding the availability per 
> node, and even if the 4th job is doing something completely trivial (e.g. 
> parallelize 1 to 1 and sum).  Killing one of the jobs typically allows 
> the others to start proceeding.
> 
> While jobs are hung, I see the following in mesos master logs:
> 
> I0820 19:28:02.651296 24666 master.cpp:2282] Sending 7 offers to framework 
> 20140820-170154-1315739402-5050-24660-0020
> I0820 19:28:02.654502 24668 master.cpp:1578] Processing reply for offers: [ 
> 20140820-170154-1315739402-5050-24660-96624 ] on slave 
> 20140724-150750-1315739402-5050-25405-6 (dn-04) for framework 
> 20140820-170154-1315739402-5050-24660-0020
> I0820 19:28:02.654722 24668 hierarchical_allocator_process.hpp:590] Framework 
> 20140820-170154-1315739402-5050-24660-0020 filtered slave 
> 20140724-150750-1315739402-5050-25405-6 for 1secs
> 
> Am I correctly interpreting that to mean that spark is being offered 
> resources, but is rejecting them?  Is there a way (short of patching spark to 
> add more logging) to figure out why resources are being rejected?
> 
> This is on the default fine-grained mode.
>

Re: [VOTE] Release Apache Mesos 0.20.0 (rc2)

2014-08-19 Thread Timothy Chen

Make check passes with Docker tests passed as well on Ubuntu 14.04

+1

Tim

On Tue, Aug 19, 2014 at 11:28 AM, Ian Downes
 wrote:
> +1
> make check passes on CentOS 5.5 with 3.4 kernel. Docker tests were ignored.
>
>
> On Aug 19, 2014, at 10:54 AM, Vinod Kone  wrote:
>
>> +1
>>
>> make check passes on OSX Mavericks  and CentOS 5.5
>>
>>
>> On Mon, Aug 18, 2014 at 11:26 PM, Jie Yu  wrote:
>>
>>> Hi all,
>>>
>>> Please vote on releasing the following candidate as Apache Mesos 0.20.0.
>>>
>>> NOTE: 0.20.0-rc1 has a bug on Mac (MESOS-1713) which is fixed in
>>> 0.20.0-rc2.
>>>
>>>
>>> 0.20.0 includes the following:
>>>
>>> 
>>> This release includes a lot of new cool features. The major new features
>>> are
>>> listed below:
>>>
>>> * Docker support in Mesos:
>>>  * Users now can launch executors/tasks within Docker containers.
>>>  * Mesos now supports running multiple containerizers simultaneously. The
>>> slave
>>>can dynamically choose a containerizer to launch containers based on
>>> the
>>>configuration of executors/tasks.
>>>
>>> * Container level network monitoring for mesos containerizer:
>>>  * Network statistics for each active container can be retrieved through
>>> the
>>>/monitor/statistics.json endpoint on the slave.
>>>  * Completely transparent to the tasks running on the slave. No need to
>>> change
>>>the service discovery mechanism for tasks.
>>>
>>> * Framework authorization:
>>>  * Allows frameworks to (re-)register with authorized roles.
>>>  * Allows frameworks to launch tasks/executors as authorized users.
>>>  * Allows authorized principals to shutdown framework(s) through HTTP
>>> endpoint.
>>>
>>> * Framework rate limiting:
>>>  * In a multi-framework environment, this feature aims to protect the
>>>throughput of high-SLA (e.g., production, service) frameworks by having
>>> the
>>>master throttle messages from other (e.g., development, batch)
>>> frameworks.
>>>
>>> * Enable building against installed third-party dependencies.
>>>
>>> * API Changes:
>>>  * [MESOS-857] - The Python API now uses different namespacing. This will
>>> break
>>>existing schedulers, please refer to the upgrades document.
>>>  * [MESOS-1409] - Status update acknowledgements are sent through the
>>> Master
>>>now. This only affects you if you're using a non-Mesos binding (e.g.
>>> pure
>>>language binding), in which case refer to the upgrades document.
>>>
>>> * HTTP endpoint changes:
>>>  * [MESOS-1188] - "deactivated_slaves" represents inactive slaves in
>>> "/stats.json" and "/state.json".
>>>  * [MESOS-1390] - "/shutdown" authenticated endpoint has been added to
>>> master to shutdown a framework.
>>>
>>> * Deprecations:
>>>  * [MESOS-1219] - Master should disallow completed frameworks from
>>> re-registering with same framework id.
>>>  * [MESOS-1695] - "/stats.json" on the slave exposes "registered" value as
>>> string instead of integer.
>>>
>>>
>>> This release also includes several bug fixes and stability improvements.
>>>
>>> 
>>>
>>> The candidate for Mesos 0.20.0 release is available at:
>>> https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc2/mesos-0.20.0.tar.gz
>>>
>>> The tag to be voted on is 0.20.0-rc2:
>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.20.0-rc2
>>>
>>> The MD5 checksum of the tarball can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc2/mesos-0.20.0.tar.gz.md5
>>>
>>> The signature of the tarball can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/mesos/0.20.0-rc2/mesos-0.20.0.tar.gz.asc
>>>
>>> The PGP key used to sign the release is here:
>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>>
>>> The JAR is up in Maven in a staging repository here:
>>> https://repository.apache.org/content/repositories/orgapachemesos-1030
>>>
>>> Please vote on releasing this package as Apache Mesos 0.20.0!
>>>
>>> The vote is open until Thu Aug 21 23:23:19 PDT 2014 and passes if a
>>> majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Mesos 0.20.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> Thanks,
>>> - Jie
>>>
>

Re: [VOTE] Release Apache Mesos 0.19.1 (rc1)

2014-07-14 Thread Timothy Chen

+1 (non-binding).

Tim

On Mon, Jul 14, 2014 at 2:32 PM, Benjamin Mahler
 wrote:
> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 0.19.1.
>
>
> 0.19.1 includes the following:
> 
> Fixes a long standing critical bug in the JNI bindings that can lead to
> framework unregistration.
> Allows the mesos fetcher to handle 30X redirects.
> Fixes a CHECK failure during container destruction.
> Fixes a regression that prevented local runs from working correctly.
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.19.1-rc1
> 
>
> The candidate for Mesos 0.19.1 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/0.19.1-rc1/mesos-0.19.1.tar.gz
>
> The tag to be voted on is 0.19.1-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=0.19.1-rc1
>
> The MD5 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/0.19.1-rc1/mesos-0.19.1.tar.gz.md5
>
> The signature of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/0.19.1-rc1/mesos-0.19.1.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is up in Maven in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1025
>
> Please vote on releasing this package as Apache Mesos 0.19.1!
>
> The vote is open until Thu Jul 17 14:28:59 PDT 2014 and passes if a
> majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 0.19.1
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Ben

47 matches

Mail list logo