[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2293:
---
Sprint: Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere 
Sprint 11, Mesosphere Sprint 13  (was: Mesosphere Q1 Sprint 9 - 5/15, 
Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12)

 Implement the Call endpoint on master
 -

 Key: MESOS-2293
 URL: https://issues.apache.org/jira/browse/MESOS-2293
 Project: Mesos
  Issue Type: Story
Reporter: Vinod Kone
Assignee: Isabel Jimenez
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2545) Developer guide for libprocess

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2545:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 Developer guide for libprocess
 --

 Key: MESOS-2545
 URL: https://issues.apache.org/jira/browse/MESOS-2545
 Project: Mesos
  Issue Type: Documentation
  Components: libprocess
Reporter: Bernd Mathiske
Assignee: Joerg Schad
  Labels: documentation, mesosphere

 Create a developer guide for libprocess that explains the philosophy behind 
 it and explains the most important features as well as the prevalent use 
 patterns in Mesos with examples. 
 This could be similar to stout/README.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2497) Create synchronous validations for Calls

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2497:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 Create synchronous validations for Calls
 

 Key: MESOS-2497
 URL: https://issues.apache.org/jira/browse/MESOS-2497
 Project: Mesos
  Issue Type: Bug
Reporter: Isabel Jimenez
Assignee: Isabel Jimenez
  Labels: HTTP, mesosphere

 /call endpoint will return a 202 accepted code but has to do some basic 
 validations before. In case of invalidation it will return a 4xx code. We 
 have to create a mechanism that will validate the 'request' and send back the 
 appropriate code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2888) Add SSL socket tests

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2888:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 Add SSL socket tests
 

 Key: MESOS-2888
 URL: https://issues.apache.org/jira/browse/MESOS-2888
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
  Labels: libprocess, ssl, tests





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2719) Removing '.json' extension in master endpoints url

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2719:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 Removing '.json' extension in master endpoints url
 --

 Key: MESOS-2719
 URL: https://issues.apache.org/jira/browse/MESOS-2719
 Project: Mesos
  Issue Type: Improvement
Reporter: Isabel Jimenez
Assignee: Isabel Jimenez
  Labels: HTTP, mesosphere

 Remove the '.json' extension on endpoints such as `/master/stats.json` so it 
 become `/master/stats`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2119) Add Socket tests

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2119:
---
Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, 
Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 
Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Sprint 
10, Mesosphere Sprint 11, Mesosphere Sprint 13  (was: Mesosphere Q4 Sprint 3 - 
12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere 
Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 
3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere 
Q2 Sprint 8 - 5/1, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere 
Sprint 12)

 Add Socket tests
 

 Key: MESOS-2119
 URL: https://issues.apache.org/jira/browse/MESOS-2119
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Niklas Quarfot Nielsen
Assignee: Joris Van Remoortere
  Labels: mesosphere

 Add more Socket specific tests to get coverage while doing libev to libevent 
 (w and wo SSL) move



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2873) style hook prevent's valid markdown files from getting committed

2015-06-22 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596041#comment-14596041
 ] 

Alexander Rojas edited comment on MESOS-2873 at 6/22/15 3:36 PM:
-

Hey [~marco-mesos], I didn't set it as reviewable, since it wasn't accepted, 
and AFAIK we have to wait for the Accepted to set it to reviewable. However, I 
already had a fix. 

I wonder what the standard procedure is then? Hold the patch until it is 
accepted? publish it and set it to reviewable even if it was never accepted?


was (Author: arojas):
With all due respect [~marco-mesos], I didn't set it as reviewable, since it 
wasn't accepted. But I already had a fix. Should I then sit down and cry until 
someone decided to accept it before I either open the ticket or publish the 
patch?

 style hook prevent's valid markdown files from getting committed
 

 Key: MESOS-2873
 URL: https://issues.apache.org/jira/browse/MESOS-2873
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Alexander Rojas
Priority: Trivial
  Labels: mesosphere
 Fix For: 0.23.0


 According to the original [markdown 
 specification|http://daringfireball.net/projects/markdown/syntax#p] and to 
 the most [recent 
 standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two 
 spaces at the end of a line create a hard line break (it breaks the line 
 without starting a new paragraph), similar to the html code {{br/}}. 
 However, there's a hook in mesos which prevent files with trailing whitespace 
 to be committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2600) Add /reserve and /unreserve endpoints on the master for dynamic reservation

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2600:
---
Sprint: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13  
(was: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12)

 Add /reserve and /unreserve endpoints on the master for dynamic reservation
 ---

 Key: MESOS-2600
 URL: https://issues.apache.org/jira/browse/MESOS-2600
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Michael Park
Assignee: Michael Park
Priority: Critical
  Labels: mesosphere

 Enable operators to manage dynamic reservations by Introducing the 
 {{/reserve}} and {{/unreserve}} HTTP endpoints on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2394) Create styleguide for documentation

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2394:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 Create styleguide for documentation
 ---

 Key: MESOS-2394
 URL: https://issues.apache.org/jira/browse/MESOS-2394
 Project: Mesos
  Issue Type: Documentation
Reporter: Joerg Schad
Assignee: Joerg Schad
Priority: Minor
  Labels: mesosphere

 As of right now different pages in our documentation use quite different 
 styles. Consider for example the different emphasis for NOTE:
 * {noformat} NOTE: 
 http://mesos.apache.org/documentation/latest/slave-recovery/{noformat}
 *  {noformat}*NOTE*: http://mesos.apache.org/documentation/latest/upgrades/ 
 {noformat} 
 Would be great to establish a common style for the documentation!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2073) Fetcher cache file verification, updating and invalidation

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2073:
---
Sprint: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13  
(was: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12)

 Fetcher cache file verification, updating and invalidation
 --

 Key: MESOS-2073
 URL: https://issues.apache.org/jira/browse/MESOS-2073
 Project: Mesos
  Issue Type: Improvement
  Components: fetcher, slave
Reporter: Bernd Mathiske
Assignee: Bernd Mathiske
Priority: Minor
  Labels: mesosphere
   Original Estimate: 96h
  Remaining Estimate: 96h

 The other tickets in the fetcher cache epic do not necessitate a check sum 
 (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum 
 could be used to verify whether the file arrived without unintended 
 alterations, it can first and foremost be employed to detect and trigger 
 updates. 
 Scenario: If a UIR is requested for fetching and the indicated download has 
 the same check sum as the cached file, then the cache file will be used and 
 the download forgone. If the check sum is different, then fetching proceeds 
 and the cached file gets replaced. 
 This capability will be indicated by an additional field in the URI protobuf. 
 Details TBD, i.e. to be discussed in comments below.
 In addition to the above, even if the check sum is the same, we can support 
 voluntary cache file invalidation: a fresh download can be requested, or the 
 caching behavior can be revoked entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2200) bogus docker images result in bad error message to scheduler

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2200:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 bogus docker images result in bad error message to scheduler
 

 Key: MESOS-2200
 URL: https://issues.apache.org/jira/browse/MESOS-2200
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker
Reporter: Jay Buffington
Assignee: Joerg Schad
  Labels: mesosphere

 When a scheduler specifies a bogus image in ContainerInfo mesos doesn't tell 
 the scheduler that the docker pull failed or why.
 This error is logged in the mesos-slave log, but it isn't given to the 
 scheduler (as far as I can tell):
 {noformat}
 E1218 23:50:55.406230  8123 slave.cpp:2730] Container 
 '8f70784c-3e40-4072-9ca2-9daed23f15ff' for executor 
 'thermos-1418946354013-xxx-xxx-curl-0-f500cc41-dd0a-4338-8cbc-d631cb588bb1' 
 of framework '20140522-213145-1749004561-5050-29512-' failed to start: 
 Failed to 'docker pull 
 docker-registry.example.com/doesntexist/hello1.1:latest': exit status = 
 exited with status 1 stderr = 2014/12/18 23:50:55 Error: image 
 doesntexist/hello1.1 not found
 {noformat}
 If the docker image is not in the registry, the scheduler should give the 
 user an error message.  If docker pull failed because of networking issues, 
 it should be retried.  Mesos should give the scheduler enough information to 
 be able to make that decision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2166) PerfEventIsolatorTest.ROOT_CGROUPS_Sample requires 'perf' to be installed

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2166:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

  PerfEventIsolatorTest.ROOT_CGROUPS_Sample requires 'perf' to be installed
 --

 Key: MESOS-2166
 URL: https://issues.apache.org/jira/browse/MESOS-2166
 Project: Mesos
  Issue Type: Bug
Reporter: Cody Maloney
Assignee: Isabel Jimenez
  Labels: mesosphere

 The perf::valid() relies on the 'perf' command being installed. This isn't 
 always the case. Configure should probably check for the perf command exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596301#comment-14596301
 ] 

Marco Massenzio commented on MESOS-2157:


is this still being worked on? In other words, should I move it Sprint 13?

Or should we Stop progress'?

Thanks!

 Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
 

 Key: MESOS-2157
 URL: https://issues.apache.org/jira/browse/MESOS-2157
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rojas
Priority: Trivial
  Labels: mesosphere, newbie

 master/state.json exports the entire state of the cluster and can, for large 
 clusters, become massive (tens of megabytes of JSON).
 Often, a client only need information about subsets of the entire state, for 
 example all connected slaves, or information (registration info, tasks, etc) 
 belonging to a particular framework.
 We can partition state.json into many smaller endpoints, but for starters, 
 being able to get slave information and tasks information per framework would 
 be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1815) Create a guide to becoming a committer

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-1815:
---
Sprint: Mesosphere Sprint 13

 Create a guide to becoming a committer
 --

 Key: MESOS-1815
 URL: https://issues.apache.org/jira/browse/MESOS-1815
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Dominic Hamon
Assignee: Bernd Mathiske
  Labels: mesosphere

 We have a committer's guide, but the process by which one becomes a committer 
 is unclear. We should set some guidelines and a process by which we can grow 
 contributors into committers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596098#comment-14596098
 ] 

Marco Massenzio commented on MESOS-2873:


Hey Alex - no worries, if you do decide to work on something, then it's 
Accepted by definition :)
There is no special committee or super-power needed - we trust your judgement: 
if you think it's worth doing, then, by all means mark it as Accepted (and 
then, when working on it, as In Progress and the Reviewable as appropriate).

Thanks!

 style hook prevent's valid markdown files from getting committed
 

 Key: MESOS-2873
 URL: https://issues.apache.org/jira/browse/MESOS-2873
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Alexander Rojas
Priority: Trivial
  Labels: mesosphere
 Fix For: 0.23.0


 According to the original [markdown 
 specification|http://daringfireball.net/projects/markdown/syntax#p] and to 
 the most [recent 
 standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two 
 spaces at the end of a line create a hard line break (it breaks the line 
 without starting a new paragraph), similar to the html code {{br/}}. 
 However, there's a hook in mesos which prevent files with trailing whitespace 
 to be committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2295) Implement the Call endpoint on Slave

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2295:
---
Story Points: 8

 Implement the Call endpoint on Slave
 

 Key: MESOS-2295
 URL: https://issues.apache.org/jira/browse/MESOS-2295
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596093#comment-14596093
 ] 

haosdent commented on MESOS-2873:
-

I think open-reviewable should be acceptable.

 style hook prevent's valid markdown files from getting committed
 

 Key: MESOS-2873
 URL: https://issues.apache.org/jira/browse/MESOS-2873
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Alexander Rojas
Priority: Trivial
  Labels: mesosphere
 Fix For: 0.23.0


 According to the original [markdown 
 specification|http://daringfireball.net/projects/markdown/syntax#p] and to 
 the most [recent 
 standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two 
 spaces at the end of a line create a hard line break (it breaks the line 
 without starting a new paragraph), similar to the html code {{br/}}. 
 However, there's a hook in mesos which prevent files with trailing whitespace 
 to be committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2855) Update operational guide to include growing from standalone to high availability

2015-06-22 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596270#comment-14596270
 ] 

Vinod Kone commented on MESOS-2855:
---

See next steps here: 
http://mesos.apache.org/documentation/latest/submitting-a-patch/

Once submitted, please paste the review url here and change the status of this 
ticket to Reviewable.

Thanks

 Update operational guide to include growing from standalone to high 
 availability
 

 Key: MESOS-2855
 URL: https://issues.apache.org/jira/browse/MESOS-2855
 Project: Mesos
  Issue Type: Documentation
Reporter: Michael Schenck
Assignee: Michael Schenck
  Labels: documentation

 The [Operational 
 Guide|http://mesos.apache.org/documentation/latest/operational-guide/] covers 
 increasing quorum size from {{--quorum=2}}, but does not cover how to move 
 from a _standalone_ master to a high availability configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2015-06-22 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske resolved MESOS-2858.
---
Resolution: Fixed

https://reviews.apache.org/r/35438/

 FetcherCacheHttpTest.HttpMixed is flaky.
 

 Key: MESOS-2858
 URL: https://issues.apache.org/jira/browse/MESOS-2858
 Project: Mesos
  Issue Type: Bug
  Components: fetcher, test
Reporter: Benjamin Mahler
Assignee: Bernd Mathiske
  Labels: flaky-test, mesosphere

 From jenkins:
 {noformat}
 [ RUN  ] FetcherCacheHttpTest.HttpMixed
 Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
 I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
 I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
 I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
 I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
 2112ns
 I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 392ns
 I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
 I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
 I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
 STARTING
 I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 590673ns
 I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
 STARTING
 I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
 I0611 00:40:28.214774 26061 master.cpp:363] Master 
 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
 172.17.0.116:33349
 I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls= 
 --allocation_interval=1secs --allocator=HierarchicalDRF 
 --authenticate=true --authenticate_slaves=true --authenticators=crammd5 
 --credentials=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials 
 --framework_sorter=drf --help=false --initialize_driver_logging=true 
 --log_auto_initialize=true --logbufsecs=0 --logging_level=INFO 
 --quiet=false --recovery_slave_removal_limit=100% 
 --registry=replicated_log --registry_fetch_timeout=1mins 
 --registry_store_timeout=25secs --registry_strict=true 
 --root_submissions=true --slave_reregister_timeout=10mins 
 --user_sorter=drf --version=false 
 --webui_dir=/mesos/mesos-0.23.0/_inst/share/mesos/webui 
 --work_dir=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master 
 --zk_session_timeout=10secs
 I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
 authenticated frameworks to register
 I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
 authenticated slaves to register
 I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
 authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
 I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
 authenticator
 I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
 I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
 I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
 allocator process
 I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
 I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 374189ns
 I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
 VOTING
 I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
 group
 I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
 master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
 I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
 I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
 I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar
 I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated
 I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer
 I0611 00:40:28.219391 26067 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0611 00:40:28.219696 26067 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 276905ns
 I0611 00:40:28.219720 26067 

[jira] [Commented] (MESOS-2295) Implement the Call endpoint on Slave

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596305#comment-14596305
 ] 

Marco Massenzio commented on MESOS-2295:


I would really like to see this story broken down in smaller chunks of linked 
tasks.

 Implement the Call endpoint on Slave
 

 Key: MESOS-2295
 URL: https://issues.apache.org/jira/browse/MESOS-2295
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2555) Document problem/solution of MESOS-2419 in documentation.

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2555:
---
Sprint: Mesosphere Sprint 13  (was: Mesosphere Sprint 12)

 Document problem/solution of MESOS-2419 in documentation.
 -

 Key: MESOS-2555
 URL: https://issues.apache.org/jira/browse/MESOS-2555
 Project: Mesos
  Issue Type: Documentation
Reporter: Joerg Schad
Assignee: Joerg Schad
Priority: Critical
  Labels: mesosphere

 As the problem encountered in MESOS-2419 is a common problem with the default 
 systemd configuration it would make sense to document this in the upgrade 
 guide or somewhere else in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-22 Thread Paul Brett (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Brett updated MESOS-2903:
--
Story Points: 3  (was: 2)

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2294:
--
  Sprint: Twitter Mesos Q2 Sprint 6
Story Points: 8

 Implement the Events stream on master for Call endpoint
 ---

 Key: MESOS-2294
 URL: https://issues.apache.org/jira/browse/MESOS-2294
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2903:
--
Sprint: Twitter Mesos Q2 Sprint 6

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2619) Document master-scheduler communication

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596492#comment-14596492
 ] 

Marco Massenzio commented on MESOS-2619:


Not sure why this is under the {{HTTP API}} Epic - the reason it has the 
{{mesosphere}} label is (I'm guessing here) because [~cdoyle] reported it. No 
one is currently working on this, as far as I know.

 Document master-scheduler communication
 ---

 Key: MESOS-2619
 URL: https://issues.apache.org/jira/browse/MESOS-2619
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.22.0
Reporter: Connor Doyle
  Labels: mesosphere

 New users often stumble on the networking requirements for communication 
 between schedulers and the Mesos master.
 It's not explicitly stated anywhere that the master has to talk back to the 
 scheduler.  Also, some configuration options (like the LIBPROCESS_PORT 
 environment variable) are under-documented.
 This problem is exacerbated as many new users start playing with Mesos and 
 scheduers in unpredictable networking contexts (NAT, containers with bridged 
 networking, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master

2015-06-22 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596491#comment-14596491
 ] 

Vinod Kone commented on MESOS-1988:
---

[~anandmazumdar] Can you send that email today? Feel free to run it by me, if 
you need another pair of eyes.

 Scheduler driver should not generate TASK_LOST when disconnected from master
 

 Key: MESOS-1988
 URL: https://issues.apache.org/jira/browse/MESOS-1988
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: mesosphere, twitter

 Currently, the driver replies to launchTasks() with TASK_LOST if it detects 
 that it is disconnected from the master. After MESOS-1972 lands, this will be 
 the only place where driver generates TASK_LOST. See MESOS-1972 for more 
 context.
 This fix is targeted for 0.22.0 to give frameworks time to implement 
 reconciliation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2907:
---
Sprint: Mesosphere Sprint 13

 Slave : Create Basic Functionality to handle /call endpoint
 ---

 Key: MESOS-2907
 URL: https://issues.apache.org/jira/browse/MESOS-2907
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
  Labels: HTTP, mesosphere

 This is the first basic step in ensuring the basic /call functionality: 
 processing a
 POST /call
 and returning:
 202 if all goes well;
 401 if not authorized; and
 403 if the request is malformed.
 Also , we might need to store some identifier which enables us to reject 
 calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2907:
---
Story Points: 5

 Slave : Create Basic Functionality to handle /call endpoint
 ---

 Key: MESOS-2907
 URL: https://issues.apache.org/jira/browse/MESOS-2907
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
  Labels: HTTP, mesosphere

 This is the first basic step in ensuring the basic /call functionality: 
 processing a
 POST /call
 and returning:
 202 if all goes well;
 401 if not authorized; and
 403 if the request is malformed.
 Also , we might need to store some identifier which enables us to reject 
 calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2907:
---
Sprint:   (was: Mesosphere Sprint 13)

 Slave : Create Basic Functionality to handle /call endpoint
 ---

 Key: MESOS-2907
 URL: https://issues.apache.org/jira/browse/MESOS-2907
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
  Labels: HTTP, mesosphere

 This is the first basic step in ensuring the basic /call functionality: 
 processing a
 POST /call
 and returning:
 202 if all goes well;
 401 if not authorized; and
 403 if the request is malformed.
 Also , we might need to store some identifier which enables us to reject 
 calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2831) FetcherCacheTest.SimpleEviction is flaky

2015-06-22 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596030#comment-14596030
 ] 

Bernd Mathiske commented on MESOS-2831:
---

Yep. Should be fixed now. Thx.

 FetcherCacheTest.SimpleEviction is flaky
 

 Key: MESOS-2831
 URL: https://issues.apache.org/jira/browse/MESOS-2831
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.23.0
Reporter: Vinod Kone
Assignee: Bernd Mathiske
  Labels: flaky-test, mesosphere

 Saw this when reviewbot was testing an unrelated review 
 https://reviews.apache.org/r/35119/
 {code}
 [ RUN  ] FetcherCacheTest.SimpleEviction
 GMOCK WARNING:
 Uninteresting mock function call - returning directly.
 Function call: resourceOffers(0x5365320, @0x2b7bef9f1b20 { 128-byte 
 object B0-C0 36-E6 7B-2B 00-00 00-00 00-00 00-00 00-00 20-75 00-18 7C-2B 
 00-00 C0-75 00-18 7C-2B 00-00 60-76 00-18 7C-2B 00-00 00-77 00-18 7C-2B 00-00 
 40-3A 00-18 7C-2B 00-00 04-00 00-00 04-00 00-00 04-00 00-00 7C-2B 00-00 00-00 
 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 0F-00 
 00-00 })
 Stack trace:
 F0607 21:19:23.181392  4246 fetcher_cache_tests.cpp:354] CHECK_READY(offers): 
 is PENDING Failed to wait for resource offers
 *** Check failure stack trace: ***
 @ 0x2b7be56c5972  google::LogMessage::Fail()
 @ 0x2b7be56c58be  google::LogMessage::SendToLog()
 @ 0x2b7be56c52c0  google::LogMessage::Flush()
 @ 0x2b7be56c81d4  google::LogMessageFatal::~LogMessageFatal()
 @   0x97d182  _CheckFatal::~_CheckFatal()
 @   0xb58a28  
 mesos::internal::tests::FetcherCacheTest::launchTask()
 @   0xb65b50  
 mesos::internal::tests::FetcherCacheTest_SimpleEviction_Test::TestBody()
 @  0x11923b7  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x118d5b4  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x1175975  testing::Test::Run()
 @  0x1176098  testing::TestInfo::Run()
 @  0x1176620  testing::TestCase::Run()
 @  0x117b2ea  testing::internal::UnitTestImpl::RunAllTests()
 @  0x1193229  
 testing::internal::HandleSehExceptionsInMethodIfSupported()
 @  0x118e2a5  
 testing::internal::HandleExceptionsInMethodIfSupported()
 @  0x117a1f6  testing::UnitTest::Run()
 @   0xcc832b  main
 @ 0x2b7be7d46ec5  (unknown)
 @   0x872379  (unknown)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2905) JVM crashed when wrong master format specified

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596033#comment-14596033
 ] 

haosdent commented on MESOS-2905:
-

Hi, [~zborisha] , I think you problem is fixed in 
https://issues.apache.org/jira/browse/MESOS-2636 

 JVM crashed when wrong master format specified
 --

 Key: MESOS-2905
 URL: https://issues.apache.org/jira/browse/MESOS-2905
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.22.1
 Environment: java version 1.8.0_45 - Oracle
 mesos version 0.22.1
 OS ubuntu 15.04
Reporter: Borisa Zivkovic

 I am using Spark with Mesos...
 I reported issue here https://issues.apache.org/jira/browse/SPARK-8524 but 
 actually after inspecting core dump it looks like it is Mesos problem
 Basically, if I specify invalid mesos master URL it crashes JVM... for 
 example mesos://http://127.0.0.1:5050 and mesos://abc://127.0.0.1:5050
 will crash JVM
 Looks like problem is in line 245 here 
 https://github.com/apache/mesos/blob/master/src/master/detector.cpp
 probably additional checks should be done and proper error reported instead 
 of crashing JVM
 here is relevant part of core dump
 [Thread debugging using libthread_db enabled]
 Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1.
 gdb where
 Core was generated by `/usr/lib/jvm/java-8-oracle/bin/java -cp 
 /home/borisa/Programs/spark-1.4.0-bin-h'.
 Program terminated with signal SIGABRT, Aborted.
 #0  0x7f8a245ad267 in __GI_raise (sig=sig@entry=6) at 
 ../sysdeps/unix/sysv/linux/raise.c:55
 55../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
 (gdb) gdb where
 Undefined command: gdb.  Try help.
 (gdb) where
 #0  0x7f8a245ad267 in __GI_raise (sig=sig@entry=6) at 
 ../sysdeps/unix/sysv/linux/raise.c:55
 #1  0x7f8a245aeeca in __GI_abort () at abort.c:89
 #2  0x7f8a23ebf6b5 in os::abort(bool) () from 
 /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
 #3  0x7f8a2405cda3 in VMError::report_and_die() () from 
 /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
 #4  0x7f8a23ec4bdf in JVM_handle_linux_signal () from 
 /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
 #5  0x7f8a23ebb493 in signalHandler(int, siginfo*, void*) () from 
 /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so
 #6  signal handler called
 #7  __GI_freeaddrinfo (ai=0x998ef7a53c7f3000) at 
 ../sysdeps/posix/getaddrinfo.c:2683
 #8  0x7f89869b4b10 in getIP () at 
 ../../../3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp:203
 #9  0x7f89869f2d9e in operator () at 
 ../../../3rdparty/libprocess/src/pid.cpp:114
 #10 0x7f89869f2768 in UPID () at 
 ../../../3rdparty/libprocess/src/pid.cpp:43
 #11 0x7f8986075108 in create () at ../../src/master/detector.cpp:245
 #12 0x7f89862768c4 in start () at ../../src/sched/sched.cpp:1515
 #13 0x7f8986a97418 in Java_org_apache_mesos_MesosSchedulerDriver_start () 
 at ../../src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp:603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed

2015-06-22 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596041#comment-14596041
 ] 

Alexander Rojas commented on MESOS-2873:


With all due respect [~marco-mesos], I didn't set it as reviewable, since it 
wasn't accepted. But I already had a fix. Should I then sit down and cry until 
someone decided to accept it before I either open the ticket or publish the 
patch?

 style hook prevent's valid markdown files from getting committed
 

 Key: MESOS-2873
 URL: https://issues.apache.org/jira/browse/MESOS-2873
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Alexander Rojas
Priority: Trivial
  Labels: mesosphere
 Fix For: 0.23.0


 According to the original [markdown 
 specification|http://daringfireball.net/projects/markdown/syntax#p] and to 
 the most [recent 
 standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two 
 spaces at the end of a line create a hard line break (it breaks the line 
 without starting a new paragraph), similar to the html code {{br/}}. 
 However, there's a hook in mesos which prevent files with trailing whitespace 
 to be committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.

2015-06-22 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596048#comment-14596048
 ] 

Bernd Mathiske commented on MESOS-2858:
---

Yes, this looks like the same problem that just got fixed by this:

https://reviews.apache.org/r/35438/

 FetcherCacheHttpTest.HttpMixed is flaky.
 

 Key: MESOS-2858
 URL: https://issues.apache.org/jira/browse/MESOS-2858
 Project: Mesos
  Issue Type: Bug
  Components: fetcher, test
Reporter: Benjamin Mahler
Assignee: Bernd Mathiske
  Labels: flaky-test, mesosphere

 From jenkins:
 {noformat}
 [ RUN  ] FetcherCacheHttpTest.HttpMixed
 Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC'
 I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms
 I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns
 I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns
 I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 
 2112ns
 I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 392ns
 I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery
 I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status
 I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to 
 STARTING
 I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 590673ns
 I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to 
 STARTING
 I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status
 I0611 00:40:28.214774 26061 master.cpp:363] Master 
 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 
 172.17.0.116:33349
 I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls= 
 --allocation_interval=1secs --allocator=HierarchicalDRF 
 --authenticate=true --authenticate_slaves=true --authenticators=crammd5 
 --credentials=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials 
 --framework_sorter=drf --help=false --initialize_driver_logging=true 
 --log_auto_initialize=true --logbufsecs=0 --logging_level=INFO 
 --quiet=false --recovery_slave_removal_limit=100% 
 --registry=replicated_log --registry_fetch_timeout=1mins 
 --registry_store_timeout=25secs --registry_strict=true 
 --root_submissions=true --slave_reregister_timeout=10mins 
 --user_sorter=drf --version=false 
 --webui_dir=/mesos/mesos-0.23.0/_inst/share/mesos/webui 
 --work_dir=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master 
 --zk_session_timeout=10secs
 I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing 
 authenticated frameworks to register
 I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing 
 authenticated slaves to register
 I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for 
 authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials'
 I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' 
 authenticator
 I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled
 I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given
 I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical 
 allocator process
 I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING
 I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 374189ns
 I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to 
 VOTING
 I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos 
 group
 I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is 
 master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042
 I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master!
 I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar
 I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar
 I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated
 I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer
 I0611 00:40:28.219391 26067 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0611 00:40:28.219696 26067 

[jira] [Commented] (MESOS-1815) Create a guide to becoming a committer

2015-06-22 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596051#comment-14596051
 ] 

Bernd Mathiske commented on MESOS-1815:
---

@marco Still in progress. Next step: post the checklist in the web site.

 Create a guide to becoming a committer
 --

 Key: MESOS-1815
 URL: https://issues.apache.org/jira/browse/MESOS-1815
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Dominic Hamon
Assignee: Bernd Mathiske
  Labels: mesosphere

 We have a committer's guide, but the process by which one becomes a committer 
 is unclear. We should set some guidelines and a process by which we can grow 
 contributors into committers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2501) Doxygen style for libprocess

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2501:
---
Sprint: Mesosphere Sprint 13

 Doxygen style for libprocess
 

 Key: MESOS-2501
 URL: https://issues.apache.org/jira/browse/MESOS-2501
 Project: Mesos
  Issue Type: Documentation
  Components: libprocess
Reporter: Bernd Mathiske
Assignee: Joerg Schad
  Labels: mesosphere
   Original Estimate: 7m
  Remaining Estimate: 7m

 Create a description of the Doxygen style to use for libprocess 
 documentation. 
 It is expected that this will later also become the Doxygen style for stout 
 and Mesos, but we are working on libprocess only for now.
 Possible outcome: a file named docs/doxygen-style.md
 We hope for much input and expect a lot of discussion!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2501) Doxygen style for libprocess

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596312#comment-14596312
 ] 

Marco Massenzio commented on MESOS-2501:


The review for this was out 6 days ago and it has two Ship It's - can we please 
commit and resolve this story?

Thanks!

 Doxygen style for libprocess
 

 Key: MESOS-2501
 URL: https://issues.apache.org/jira/browse/MESOS-2501
 Project: Mesos
  Issue Type: Documentation
  Components: libprocess
Reporter: Bernd Mathiske
Assignee: Joerg Schad
  Labels: mesosphere
   Original Estimate: 7m
  Remaining Estimate: 7m

 Create a description of the Doxygen style to use for libprocess 
 documentation. 
 It is expected that this will later also become the Doxygen style for stout 
 and Mesos, but we are working on libprocess only for now.
 Possible outcome: a file named docs/doxygen-style.md
 We hope for much input and expect a lot of discussion!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2226:
---
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 
Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11  
(was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere 
Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 
12)

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Kapil Arya
  Labels: flaky, flaky-test, mesosphere

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with 

[jira] [Updated] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2199:
---
Sprint:   (was: Mesosphere Sprint 12)

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2512) FetcherTest.ExtractNotExecutable is flaky

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2512:
---
Sprint:   (was: Mesosphere Sprint 12)

 FetcherTest.ExtractNotExecutable is flaky
 -

 Key: MESOS-2512
 URL: https://issues.apache.org/jira/browse/MESOS-2512
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Vinod Kone
Assignee: Bernd Mathiske
  Labels: mesosphere

 Observed in our internal CI.
 {code}
 [ RUN  ] FetcherTest.ExtractNotExecutable
 Using temporary directory '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn'
 tar: Removing leading `/' from member names
 I0316 18:55:48.509306 14678 fetcher.cpp:155] Starting to fetch URIs for 
 container: de1e5165-82b4-434b-9149-8667cf652c64, directory: 
 /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn
 I0316 18:55:48.509845 14678 fetcher.cpp:238] Fetching URIs using command 
 '/var/jenkins/workspace/mesos-fedora-20-gcc/src/mesos-fetcher'
 I0316 18:55:48.568611 15028 logging.cpp:177] Logging to STDERR
 I0316 18:55:48.574928 15028 fetcher.cpp:214] Fetching URI '/tmp/DIjmjV.tar.gz'
 I0316 18:55:48.575166 15028 fetcher.cpp:194] Copying resource from 
 '/tmp/DIjmjV.tar.gz' to '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn'
 tar: This does not look like a tar archive
 tar: Exiting with failure status due to previous errors
 Failed to extract 
 /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz:Failed to extract: 
 command tar -C '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' -xf 
 '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz' exited with 
 status: 512
 tests/fetcher_tests.cpp:686: Failure
 (fetch).failure(): Failed to fetch URIs for container 
 'de1e5165-82b4-434b-9149-8667cf652c64'with exit status: 256
 [  FAILED  ] FetcherTest.ExtractNotExecutable (208 ms)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2205) Add user documentation for reservations

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2205:
---
Sprint: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, 
Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 
Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11  (was: Mesosphere 
Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 
4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere 
Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12)

 Add user documentation for reservations
 ---

 Key: MESOS-2205
 URL: https://issues.apache.org/jira/browse/MESOS-2205
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, framework
Reporter: Michael Park
Assignee: Michael Park
Priority: Critical
  Labels: mesosphere

 Add a user guide for reservations which describes basic usage of them, how 
 ACLs are used to specify who can unreserve whose resources, and few advanced 
 usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2832) Enable configuring Mesos with environment variables without having them leak to tasks launched

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2832:
---
Sprint:   (was: Mesosphere Sprint 12)

 Enable configuring Mesos with environment variables without having them leak 
 to tasks launched
 --

 Key: MESOS-2832
 URL: https://issues.apache.org/jira/browse/MESOS-2832
 Project: Mesos
  Issue Type: Wish
Reporter: Cody Maloney
Assignee: Benjamin Hindman
Priority: Critical
  Labels: mesosphere

 Currently if mesos is configured with environment variables (MESOS_MODULES), 
 those show up in every task which is launched unless the executor explicitly 
 cleans them up. 
 If the task being launched happens to be something libprocess / mesos based, 
 this can often prevent the task from starting up (A scheduler has issues 
 loading a module intended for the slave).
 There are also cases where it would be nice to be able to change what the 
 PATH is that tasks launch with (the host may have more in the path than tasks 
 are supposed to / allowed to depend upon).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2784) Add constexpr to C++11 whitelist

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2784:
--
Sprint: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 
Sprint 6  (was: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5)

 Add constexpr to C++11 whitelist
 

 Key: MESOS-2784
 URL: https://issues.apache.org/jira/browse/MESOS-2784
 Project: Mesos
  Issue Type: Improvement
  Components: documentation
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 constexpr is currently used to eliminate initialization dependency issues for 
 non-POD objects.  We should add it to the whitelist of acceptable c++11 
 features in the style guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2794) Implement filesystem isolators

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2794:
--
Sprint: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 
Sprint 6  (was: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5)

 Implement filesystem isolators
 --

 Key: MESOS-2794
 URL: https://issues.apache.org/jira/browse/MESOS-2794
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.22.1
Reporter: Ian Downes
Assignee: Ian Downes
  Labels: twitter

 Move persistent volume support from Mesos containerizer to separate 
 filesystem isolators, including support for container rootfs, where possible.
 Use symlinks for posix systems without container rootfs. Use bind mounts for 
 Linux with/without container rootfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2798) Export statistics on unevictable memory

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2798:
--
Sprint: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 
Sprint 6  (was: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5)

 Export statistics on unevictable memory
 -

 Key: MESOS-2798
 URL: https://issues.apache.org/jira/browse/MESOS-2798
 Project: Mesos
  Issue Type: Improvement
Reporter: Chi Zhang
Assignee: Chi Zhang
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2853) Report per-container metrics from host egress filter

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2853:
--
Sprint: Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 Sprint 6  (was: Twitter 
Mesos Q2 Sprint 5)

 Report per-container metrics from host egress filter
 

 Key: MESOS-2853
 URL: https://issues.apache.org/jira/browse/MESOS-2853
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Reporter: Paul Brett
Assignee: Paul Brett
  Labels: twitter

 Export in statistics.json the fq_codel flow statistics for each container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2793) Add support for container rootfs to Mesos isolators

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2793:
--
Sprint: Twitter Mesos Q2 Sprint 6

 Add support for container rootfs to Mesos isolators
 ---

 Key: MESOS-2793
 URL: https://issues.apache.org/jira/browse/MESOS-2793
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.22.1
Reporter: Ian Downes
Assignee: Ian Downes
  Labels: twitter
 Fix For: 0.23.0


 Mesos containers can have a different rootfs to the host. Update Isolator 
 interface to pass rootfs during Isolator::prepare(). Update Isolators where  
 necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-2793) Add support for container rootfs to Mesos isolators

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone resolved MESOS-2793.
---
   Resolution: Fixed
Fix Version/s: 0.23.0

commit 610d4fffd0511d7ddce286ae987264cc5892f76c
Author: Ian Downes idow...@twitter.com
Date:   Thu May 7 14:28:46 2015 -0700

Add container rootfs to Isolator::prepare().

Review: https://reviews.apache.org/r/34134


 Add support for container rootfs to Mesos isolators
 ---

 Key: MESOS-2793
 URL: https://issues.apache.org/jira/browse/MESOS-2793
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.22.1
Reporter: Ian Downes
Assignee: Ian Downes
  Labels: twitter
 Fix For: 0.23.0


 Mesos containers can have a different rootfs to the host. Update Isolator 
 interface to pass rootfs during Isolator::prepare(). Update Isolators where  
 necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-786) Update semantics of when framework registered()/reregistered() get called

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-786:
-
  Sprint: Twitter Mesos Q2 Sprint 6
Assignee: Vinod Kone
Story Points: 3

 Update semantics of when framework registered()/reregistered() get called
 -

 Key: MESOS-786
 URL: https://issues.apache.org/jira/browse/MESOS-786
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone

 Current semantics:
 1) Framework connects w/ master very first time -- registered()
 2) Framework reconnects w/ same master after a zk blip -- reregistered()
 3) Framework reconnects w/ failed over master -- registered()
 4) Failed over framework connects w/ same master -- registered()
 5) Failed over framework connects w/ failed over master -- registered() 
 Updated semantics:
 Everything same except 
 3) Framework reconnects w/ failed over master -- reregistered()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint

2015-06-22 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596402#comment-14596402
 ] 

Vinod Kone commented on MESOS-2907:
---

s/executor/client/ ?

I'm assuming this is going to be useful for both scheduler - master and slave 
- executor? Or is this ticket only tracking the work to add it to the slave?

 Slave : Create Basic Functionality to handle /call endpoint
 ---

 Key: MESOS-2907
 URL: https://issues.apache.org/jira/browse/MESOS-2907
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
  Labels: HTTP, mesosphere

 This is the first basic step in ensuring the basic /call functionality: 
 processing a
 POST /call
 and returning:
 202 if all goes well;
 401 if not authorized; and
 403 if the request is malformed.
 Also , we might need to store some identifier which enables us to reject 
 calls to /call if the executor has not issues a SUBSCRIBE/RESUBSCRIBE Request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2226:
---
Story Points: 3  (was: 5)

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Kapil Arya
  Labels: flaky, flaky-test, mesosphere

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with proposal 2
 I0114 18:51:34.734076  4734 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 87441ns
 I0114 18:51:34.734441  4734 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.740272  4739 replica.cpp:511] Replica received write request 
 for position 0
 I0114 18:51:34.740910  4739 leveldb.cpp:438] Reading position from leveldb 
 took 59846ns
 I0114 18:51:34.741672  4739 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 189259ns
 I0114 18:51:34.741919  4739 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.743000  4739 replica.cpp:658] Replica received 

[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2226:
---
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 
Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, 
Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, 
Mesosphere Sprint 13  (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 
3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, 
Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 
Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere 
Sprint 11)

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Kapil Arya
  Labels: flaky, flaky-test, mesosphere

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with 

[jira] [Created] (MESOS-2914) Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.

2015-06-22 Thread Jie Yu (JIRA)
Jie Yu created MESOS-2914:
-

 Summary: Port mapping isolator should cleanup unknown orphan 
containers after all known orphan containers are recovered during recovery.
 Key: MESOS-2914
 URL: https://issues.apache.org/jira/browse/MESOS-2914
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu


Otherwise, the icmp/arp filter on host eth0 might be removed as a result of 
_cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both 
known/unknown orphan containers.

{noformat}
I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer
I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network namespace 
handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 - 31607
I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network namespace 
handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d - 41020
I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network namespace 
handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 - 3302
I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network namespace 
handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 - 19805
I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters 
with ports [33792,34815] for container with pid 52304
I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports 
[33792,34816) for container with pid 52304
I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed 
cleanup for pid 52304
I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery 
complete
I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 
111ea69c-6184-4da1-a0e9-c34e8c6deb30
I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container 
ddcb8397-3552-44f9-bc99-b5b69aa72944
I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30
I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 
8953fc7f-9fca-4931-b0cb-2f4959ddee74
I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944
I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 
1.730304ms
I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 
50f9986f-ebbc-440d-86a7-9fa1a7c55a75
I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74
I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 
1.685248ms
I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30
I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container 
d8c48a4a-fdfb-47dd-b8d8-07188c21600d
I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75
I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944
I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 
2.076928ms
I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d
I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery
W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown 
container 111ea69c-6184-4da1-a0e9-c34e8c6deb30
I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup 
/sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 
2.12096ms
I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 
20150319-213133-2080910346-5050-57551-S3314
I0612 17:46:51.56 16318 cgroups.cpp:1420] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 
1.323008ms
W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for unknown 
container 111ea69c-6184-4da1-a0e9-c34e8c6deb30
I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling 
'/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 
6.9341503407days in the future
W0612 17:46:51.569725 16329 disk.cpp:299] Ignoring cleanup for unknown 
container ddcb8397-3552-44f9-bc99-b5b69aa72944
I0612 17:46:51.570911 16325 cgroups.cpp:2394] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d
I0612 17:46:51.573581 16316 port_mapping.cpp:2597] Removing IP packet filters 
with ports [35840,36863] for container with pid 31607
I0612 17:46:51.575127 16316 port_mapping.cpp:2616] Freed ephemeral ports 
[35840,36864) for 

[jira] [Commented] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-22 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596670#comment-14596670
 ] 

Jie Yu commented on MESOS-2903:
---

In fact, my above comments is not true since known orphans will be stored in 
'infos' and unknown orphans will be cleaned up in slave recovery. See 
MESOS-2914 for the real cause.

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2912) Provide a Python library for master detection

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2912:
--
Description: 
When schedulers start interacting with Mesos master via HTTP endpoints, they 
need a way to detect masters. 

Mesos should provide a master detection Python library to make this easy for 
frameworks.

  was:
When schedulers start interacting with Mesos master via HTTP endpoints, they 
need a way to detect masters. 

Mesos should provide a master detection Java library to make this easy for 
frameworks.


 Provide a Python library for master detection
 -

 Key: MESOS-2912
 URL: https://issues.apache.org/jira/browse/MESOS-2912
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone

 When schedulers start interacting with Mesos master via HTTP endpoints, they 
 need a way to detect masters. 
 Mesos should provide a master detection Python library to make this easy for 
 frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2906) Slave : Synchronous Validation for Calls

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2906:
---
Sprint:   (was: Mesosphere Sprint 13)

 Slave : Synchronous Validation for Calls
 

 Key: MESOS-2906
 URL: https://issues.apache.org/jira/browse/MESOS-2906
 Project: Mesos
  Issue Type: Task
Reporter: Anand Mazumdar
Assignee: Anand Mazumdar
  Labels: HTTP, mesosphere

 /call endpoint on the slave will return a 202 accepted code but has to do 
 some basic validations before. In case of invalidation it will return a 4xx 
 code.  
 - We need to create the required infrastructure to validate the request and 
 then process it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2295) Implement the Call endpoint on Slave

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2295:
---
Sprint:   (was: Mesosphere Sprint 13)

 Implement the Call endpoint on Slave
 

 Key: MESOS-2295
 URL: https://issues.apache.org/jira/browse/MESOS-2295
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2912) Provide a Python library for master detection

2015-06-22 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-2912:
-

 Summary: Provide a Python library for master detection
 Key: MESOS-2912
 URL: https://issues.apache.org/jira/browse/MESOS-2912
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone


When schedulers start interacting with Mesos master via HTTP endpoints, they 
need a way to detect masters. 

Mesos should provide a master detection Java library to make this easy for 
frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2756) Update style guide: Avoid object slicing

2015-06-22 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-2756:

Sprint:   (was: Mesosphere Sprint 13)

 Update style guide: Avoid object slicing
 

 Key: MESOS-2756
 URL: https://issues.apache.org/jira/browse/MESOS-2756
 Project: Mesos
  Issue Type: Improvement
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere
  Labels: c++

 In order to improve the safety of our code base, let's augment the style 
 guide to:
 Disallow public construction of base classes
 so that we can avoid the object slicing problem. This is a good pattern to 
 follow in general as it prevents subtle semantic bugs like the following:
 {code:title=ObjectSlicing.cpp|borderStyle=solid}
 #include stdio.h
 #include vector
 class Base {
   public:
   Base(int _v) : v(_v) {}
   virtual int get() const { return v; }
   protected:
   int v;
 };
 class Derived : public Base {
   public:
   Derived(int _v) : Base(_v) {}
   virtual int get() const { return v + 1; }
 };
 int main() {
   Base b(5);
   Derived d(5);
   std::vectorBase vec;
   vec.push_back(b);
   vec.push_back(d);
   for (const auto v : vec) {
 printf([%d]\n, v.get());
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1575) master sets failover timeout to 0 when framework requests a high value

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio reassigned MESOS-1575:
--

Assignee: (was: Timothy Chen)

 master sets failover timeout to 0 when framework requests a high value
 --

 Key: MESOS-1575
 URL: https://issues.apache.org/jira/browse/MESOS-1575
 Project: Mesos
  Issue Type: Bug
Reporter: Kevin Sweeney
  Labels: newbie, twitter

 In response to a registered RPC we observed the following behavior:
 {noformat}
 W0709 19:07:32.982997 11400 master.cpp:612] Using the default value for 
 'failover_timeout' becausethe input value is invalid: Argument out of the 
 range that a Duration can represent due to int64_t's size limit
 I0709 19:07:32.983008 11404 hierarchical_allocator_process.hpp:408] 
 Deactivated framework 20140709-184342-119646400-5050-11380-0003
 I0709 19:07:32.983013 11400 master.cpp:617] Giving framework 
 20140709-184342-119646400-5050-11380-0003 0ns to failover
 I0709 19:07:32.983271 11404 master.cpp:2201] Framework failover timeout, 
 removing framework 20140709-184342-119646400-5050-11380-0003
 I0709 19:07:32.983294 11404 master.cpp:2688] Removing framework 
 20140709-184342-119646400-5050-11380-0003
 I0709 19:07:32.983678 11404 hierarchical_allocator_process.hpp:363] Removed 
 framework 20140709-184342-119646400-5050-11380-0003
 {noformat}
 This was using the following frameworkInfo.
 {code}
 FrameworkInfo frameworkInfo = FrameworkInfo.newBuilder()
 .setUser(test)
 .setName(jvm)
 .setFailoverTimeout(Long.MAX_VALUE)
 .build();
 {code}
 Instead of silently defaulting large values to 0 the master should refuse to 
 process the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2619) Document master-scheduler communication

2015-06-22 Thread Connor Doyle (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596693#comment-14596693
 ] 

Connor Doyle commented on MESOS-2619:
-

[~vinodkone] you're right; this refers to the existing driver, not the HTTP 
API.  For one, there's no reference to Libprocess on the documentation landing 
page (http://mesos.apache.org/documentation/latest/).  Adding something there 
would be helpful, if only to seed stuck users' search terms.  An overview page 
could describe the asynchronous-protobuf-over-HTTP design.  Perhaps such a page 
could also be linked from an FAQ/troubleshooting topic about framework 
connection problems.  It's great that the HTTP API will simplify some of these 
use cases.  The existing driver will probably still be around long enough that 
it's worth adding docs about it.

[~marco-mesos] correct, I tagged it but am not currently working the issue.  We 
were using the tag to simply indicate interest back then.

 Document master-scheduler communication
 ---

 Key: MESOS-2619
 URL: https://issues.apache.org/jira/browse/MESOS-2619
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.22.0
Reporter: Connor Doyle
  Labels: mesosphere

 New users often stumble on the networking requirements for communication 
 between schedulers and the Mesos master.
 It's not explicitly stated anywhere that the master has to talk back to the 
 scheduler.  Also, some configuration options (like the LIBPROCESS_PORT 
 environment variable) are under-documented.
 This problem is exacerbated as many new users start playing with Mesos and 
 scheduers in unpredictable networking contexts (NAT, containers with bridged 
 networking, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2911) Add an Event message handler to scheduler library

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2911:
--
Assignee: Benjamin Mahler  (was: Vinod Kone)

 Add an Event message handler to scheduler library
 -

 Key: MESOS-2911
 URL: https://issues.apache.org/jira/browse/MESOS-2911
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Benjamin Mahler

 Adding this handler lets master send Event messages to the library.
 See MESOS-2909 for additional context.
 This ticket only tracks the installation of the handler and maybe handling of 
 a single event for testing. Additional events handling will be captured in a 
 different ticket(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2551) C++ Scheduler library should send Call messages to Master

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2551:
---
Sprint: Mesosphere Sprint 13

 C++ Scheduler library should send Call messages to Master
 -

 Key: MESOS-2551
 URL: https://issues.apache.org/jira/browse/MESOS-2551
 Project: Mesos
  Issue Type: Story
Reporter: Vinod Kone
Assignee: Isabel Jimenez
  Labels: mesosphere

 Currently, the C++ library sends different messages to Master instead of a 
 single Call message. To vet the new Call API it should send Call messages. 
 Master should be updated to handle all types of Calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2298) Provide a Java library for master detection

2015-06-22 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-2298:
--
 Description: 
When schedulers start interacting with Mesos master via HTTP endpoints, they 
need a way to detect masters. 

Mesos should provide a master detection Java library to make this easy for 
frameworks.

  was:When schedulers start interacting with Mesos master via HTTP endpoints, 
they need a way to detect masters. Ideally, Mesos provides master detection 
library/libraries in supported languages (java and python to start with) to 
make this easy for frameworks.

Story Points: 5
 Summary: Provide a Java library for master detection  (was: Provide 
master detection library/libraries for pure schedulers)

 Provide a Java library for master detection
 ---

 Key: MESOS-2298
 URL: https://issues.apache.org/jira/browse/MESOS-2298
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone

 When schedulers start interacting with Mesos master via HTTP endpoints, they 
 need a way to detect masters. 
 Mesos should provide a master detection Java library to make this easy for 
 frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2296:
--
Summary: Implement the Events stream on slave for Call endpoint  (was: 
Implement the Events endpoint on slave)

 Implement the Events stream on slave for Call endpoint
 --

 Key: MESOS-2296
 URL: https://issues.apache.org/jira/browse/MESOS-2296
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2913) Scheduler library should send Call messages to the master

2015-06-22 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-2913:
-

 Summary: Scheduler library should send Call messages to the master
 Key: MESOS-2913
 URL: https://issues.apache.org/jira/browse/MESOS-2913
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone


To vet the new Call protobufs, it is prudent to have the scheduler driver 
(sched.cpp) send Call messages to the master (similar to what we are doing with 
the scheduler library).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2337) __init__.py not getting installed in $PREFIX/lib/pythonX.Y/site-packages/mesos

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2337:
---
Sprint: Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, 
Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 
Sprint 8 - 5/1, Mesosphere Sprint 13  (was: Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 
Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1)

 __init__.py not getting installed in $PREFIX/lib/pythonX.Y/site-packages/mesos
 --

 Key: MESOS-2337
 URL: https://issues.apache.org/jira/browse/MESOS-2337
 Project: Mesos
  Issue Type: Bug
  Components: build, python api
Reporter: Kapil Arya
Assignee: Marco Massenzio
Priority: Critical
  Labels: mesosphere

 When doing a {{make install}}, the src/python/native/src/mesos/__init__.py 
 file is not getting installed in 
 {{$PREFIX/lib/pythonX.Y/site-packages/mesos/}}.  
 This makes it impossible to do the following import when {{PYTHONPATH}} is 
 set to the {{site-packages}} directory.
 {code}
 import mesos.interface.mesos_pb2
 {code}
 The directories {{$PREFIX/lib/pythonX.Y/site-packages/mesos/interface, 
 native}} do have their corresponding {{__init__.py}} files.
 Reproducing the bug:
 {code}
 ../configure --prefix=$HOME/test-install  make install
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2914) Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.

2015-06-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-2914:
-

Assignee: Jie Yu

 Port mapping isolator should cleanup unknown orphan containers after all 
 known orphan containers are recovered during recovery.
 ---

 Key: MESOS-2914
 URL: https://issues.apache.org/jira/browse/MESOS-2914
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu
Assignee: Jie Yu

 Otherwise, the icmp/arp filter on host eth0 might be removed as a result of 
 _cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both 
 known/unknown orphan containers.
 {noformat}
 I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer
 I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 - 31607
 I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d - 41020
 I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 - 3302
 I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 - 19805
 I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters 
 with ports [33792,34815] for container with pid 52304
 I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports 
 [33792,34816) for container with pid 52304
 I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed 
 cleanup for pid 52304
 I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery 
 complete
 I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 
 111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container 
 ddcb8397-3552-44f9-bc99-b5b69aa72944
 I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 
 8953fc7f-9fca-4931-b0cb-2f4959ddee74
 I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944
 I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 
 1.730304ms
 I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 
 50f9986f-ebbc-440d-86a7-9fa1a7c55a75
 I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74
 I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 
 1.685248ms
 I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container 
 d8c48a4a-fdfb-47dd-b8d8-07188c21600d
 I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75
 I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944
 I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 
 2.076928ms
 I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d
 I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery
 W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown 
 container 111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 
 2.12096ms
 I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 
 20150319-213133-2080910346-5050-57551-S3314
 I0612 17:46:51.56 16318 cgroups.cpp:1420] Successfully froze cgroup 
 /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 
 1.323008ms
 W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for 
 unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling 
 '/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 
 6.9341503407days in the future
 W0612 17:46:51.569725 16329 disk.cpp:299] Ignoring cleanup for unknown 
 container 

[jira] [Resolved] (MESOS-2903) Network isolator should not fail when target state already exists

2015-06-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu resolved MESOS-2903.
---
Resolution: Invalid

 Network isolator should not fail when target state already exists
 -

 Key: MESOS-2903
 URL: https://issues.apache.org/jira/browse/MESOS-2903
 Project: Mesos
  Issue Type: Bug
  Components: isolation
Affects Versions: 0.23.0
Reporter: Paul Brett
Assignee: Paul Brett
Priority: Critical

 Network isolator has multiple instances of the following pattern:
 {noformat}
   Trybool something = ::create();  
   if (something.isError()) {  
  
 ++metrics.something_errors;  
 return Failure(Failed to create something ...)
   } else if (!icmpVethToEth0.get()) { 
   
 ++metrics.adding_veth_icmp_filters_already_exist; 
   
 return Failure(Something already exists);
   }   
   
 {noformat}
 These failures have occurred in operation due to the failure to recover or 
 delete an orphan, causing the slave to remain on line but unable to create 
 new resources.We should convert the second failure message in this 
 pattern to an information message since the final state of the system is the 
 state that we requested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2904) Add slave metric to count container launch failures

2015-06-22 Thread Paul Brett (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596747#comment-14596747
 ] 

Paul Brett commented on MESOS-2904:
---

Fix without test hardness is out for review https://reviews.apache.org/r/35738/


 Add slave metric to count container launch failures
 ---

 Key: MESOS-2904
 URL: https://issues.apache.org/jira/browse/MESOS-2904
 Project: Mesos
  Issue Type: Bug
  Components: slave, statistics
Reporter: Paul Brett
Assignee: Paul Brett

 We have seen circumstances where a machine has been consistently unable to 
 launch containers due to an inconsistent state (for example, unexpected 
 network configuration).   Adding a metric to track container launch failures 
 will allow us to detect and alert on slaves in such a state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2015-06-22 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596851#comment-14596851
 ] 

Vinod Kone commented on MESOS-1807:
---

There seems to be some confusion, so let me clairfy.

Mesos currently allows executors that use either 0 cpus or 0 memory. Since 
0.21.0, it emits a warning, but continues to allow them. Note that this ticket 
is not resolved yet. The goal of this ticket is to disallow executors with 0 
cpus or 0 memory.

Any issues that marathon or chronos are seeing is the long standing behavior of 
Mesos and precisely why this ticket was created. 

This deprecation was announced in the CHANGELOG for 0.21.0. Not sure if there 
was a specific email sent to the dev list though.

 Disallow executors with cpu only or memory only resources
 -

 Key: MESOS-1807
 URL: https://issues.apache.org/jira/browse/MESOS-1807
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
  Labels: newbie

 Currently master allows executors to be launched with either only cpus or 
 only memory but we shouldn't allow that.
 This is because executor is an actual unix process that is launched by the 
 slave. If an executor doesn't specify cpus, what should do the cpu limits be 
 for that executor when there are no tasks running on it? If no cpu limits are 
 set then it might starve other executors/tasks on the slave violating 
 isolation guarantees. Same goes with memory. Moreover, the current 
 containerizer/isolator code will throw failures when using such an executor, 
 e.g., when the last task on the executor finishes and Containerizer::update() 
 is called with 0 cpus or 0 mem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2862) mesos-fetcher won't fetch uris which begin with a

2015-06-22 Thread Artem Harutyunyan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596970#comment-14596970
 ] 

Artem Harutyunyan commented on MESOS-2862:
--

https://reviews.apache.org/r/35757/
https://reviews.apache.org/r/35755/

 mesos-fetcher won't fetch uris which begin with a  
 -

 Key: MESOS-2862
 URL: https://issues.apache.org/jira/browse/MESOS-2862
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.22.1
Reporter: Cody Maloney
Assignee: Artem Harutyunyan
Priority: Minor
  Labels: mesosphere, newbie

 Discovered while running mesos with marathon on top. If I launch a marathon 
 task with a URI which is  
 http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz; mesos will log to 
 stderr:
 {code}
 I0611 22:39:22.815636 35673 logging.cpp:177] Logging to STDERR
 I0611 22:39:25.643889 35673 fetcher.cpp:214] Fetching URI ' 
 http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz'
 I0611 22:39:25.648111 35673 fetcher.cpp:94] Hadoop Client not available, 
 skipping fetch with Hadoop Client
 Failed to fetch:  http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz
 Failed to synchronize with slave (it's probably exited)
 {code}
 It would be nice if mesos trimmed leading whitespace before doing protocol 
 detection so that simple mistakes are just fixed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2473) Failure to recover because of freezer timeout should not suggest removing meta data

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2473:
---
Target Version/s: 0.24.0  (was: 0.23.0)

 Failure to recover because of freezer timeout should not suggest removing 
 meta data
 ---

 Key: MESOS-2473
 URL: https://issues.apache.org/jira/browse/MESOS-2473
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.22.0
Reporter: Ian Downes
  Labels: twitter

 A more appropriate action should be suggested, e.g., manually kill the 
 processes in cgroup xxx because the slave will still attempt to clean up 
 orphans and hit the same code path.
 {noformat}
 I0310 23:04:23.961019 32342 slave.cpp:3321] Current usage 35.87%. Max allowed 
 age: 3.789365411204225days
 Failed to perform recovery: Collect failed: Timed out after 1mins
 To remedy this do as follows:
 Step 1: rm -f /var/lib/mesos/meta/slaves/latest
 This ensures slave doesn't recover old live executors.
 Step 2: Restart the slave.
 Slave Exit Status: 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2884) Allow isolators to specify required namespaces

2015-06-22 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-2884:
--
Target Version/s: 0.23.0

 Allow isolators to specify required namespaces
 --

 Key: MESOS-2884
 URL: https://issues.apache.org/jira/browse/MESOS-2884
 Project: Mesos
  Issue Type: Task
  Components: isolation
Reporter: Kapil Arya
Assignee: Kapil Arya
  Labels: mesosphere

 Currently, the LinuxLauncher looks into SlaveFlags to compute the namespaces 
 that should be enabled when launching the executor. This means that a custom 
 Isolator module doesn't have any way to specify dependency on a set of 
 namespaces.
 The proposed solution is to extend the Isolator interface to also export the 
 namespaces dependency. This way the MesosContainerizer can directly query all 
 loaded Isolators (inbuilt and custom modules) to compute the set of 
 namespaces required by the executor. This set of namespaces is then passed on 
 to the LinuxLauncher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-06-22 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-2226:
--
Assignee: Niklas Quarfot Nielsen  (was: Kapil Arya)

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Niklas Quarfot Nielsen
  Labels: flaky, flaky-test, mesosphere

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with proposal 2
 I0114 18:51:34.734076  4734 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 87441ns
 I0114 18:51:34.734441  4734 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.740272  4739 replica.cpp:511] Replica received write request 
 for position 0
 I0114 18:51:34.740910  4739 leveldb.cpp:438] Reading position from leveldb 
 took 59846ns
 I0114 18:51:34.741672  4739 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 189259ns
 I0114 18:51:34.741919  4739 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.743000  4739 

[jira] [Comment Edited] (MESOS-2618) Update C++ style guide on function definition / invocation formatting.

2015-06-22 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596770#comment-14596770
 ] 

Michael Park edited comment on MESOS-2618 at 6/22/15 10:28 PM:
---

I recently did a little formatting cleanup and received some feedback that 
relates to this ticket.

In [r35635|https://reviews.apache.org/r/35635], it was noted that we prefer:

{code}
(1)
  delay(flags.executor_shutdown_grace_period,
self(),
Slave::shutdownExecutorTimeout,
framework-id(),
executor-id,
executor-containerId);
{code}

over

{code}
(2)
  delay(
  flags.executor_shutdown_grace_period,
  self(),
  Slave::shutdownExecutorTimeout,
  framework-id(),
  executor-id,
  executor-containerId);
{code}

We also prefer:

{code}
(3)
  containerizer-wait(containerId)
.onAny(defer(self(),
 Self::executorTerminated,
 frameworkId,
 executorId,
 lambda::_1));
{code}

over

{code}
(4)
  containerizer-wait(containerId)
.onAny(defer(
self(),
Self::executorTerminated,
frameworkId,
executorId,
lambda::_1));
{code}

Both of the preferred styles above are what {{clang-format}} produces. I think 
this goes to show that what {{clang-format}} produces is good in most cases (Of 
course there are short-comings since it's a developing project, but I think the 
spectacularly bad ones are the ones we should be talking about. Rather than 
cases like this where either style is just as readable). For example, reasoning 
through and outlining the exact rules as to why we prefer (3) over (4) is 
non-trivial nor all that helpful. {{clang-format}} uses penalties for various 
undesired formatting styles and chooses a style which minimizes the total 
penalty (as per LaTeX). I think relying on that system would be more systematic 
and beneficial in terms of time-saving for all of us.


was (Author: mcypark):
I recently did a little formatting cleanup and received some feedback that 
relates to this ticket.

In [r35635|https://reviews.apache.org/r/35635], it was noted that we prefer:

{code}
(1)
  delay(flags.executor_shutdown_grace_period,
self(),
Slave::shutdownExecutorTimeout,
framework-id(),
executor-id,
executor-containerId);
{code}

over

{code}
(2)
  delay(
  flags.executor_shutdown_grace_period,
  self(),
  Slave::shutdownExecutorTimeout,
  framework-id(),
  executor-id,
  executor-containerId);
{code}

We also prefer:

{code}
(3)
  containerizer-wait(containerId)
.onAny(defer(self(),
 Self::executorTerminated,
 frameworkId,
 executorId,
 lambda::_1));
{code}

over

{code}
(4)
  containerizer-wait(containerId)
.onAny(defer(
self(),
Self::executorTerminated,
frameworkId,
executorId,
lambda::_1));
{code}

Both of the preferred styles above are what {{clang-format}} produces. I think 
this goes to show that what {{clang-format}} produces is good in most cases (Of 
course there are short-comings since it's a developing project, but I think the 
spectacularly bad ones are the ones we should be talking about. Rather than 
cases like this where either way it's just as readable). For example, reasoning 
through and outlining the exact rules as to why we prefer (3) over (4) is 
non-trivial nor all that helpful. {{clang-format}} uses penalties for various 
undesired formatting styles and chooses a style which minimizes the total 
penalty (as per LaTeX). I think relying on that system would be more systematic 
and beneficial in terms of time-saving for all of us.

 Update C++ style guide on function definition / invocation formatting. 
 ---

 Key: MESOS-2618
 URL: https://issues.apache.org/jira/browse/MESOS-2618
 Project: Mesos
  Issue Type: Documentation
Reporter: Till Toenshoff
Priority: Minor

 Our style guide currently suggests two options for cases of function 
 definitions / invocations that do not fit into a single line even when 
 breaking after the opening argument bracket;
 Fixed leading indention (4 spaces);
 {noformat}
 // 4: OK.  
 allocator-resourcesRecovered(  
 frameworkId,  
 slaveId,  
 resources,  
 filters);
 {noformat}
 Variable leading indention;
 {noformat}
 // 3: In this case, 3 is OK.  
 foobar(someArgument,  
someOtherArgument,  
theLastArgument);
 {noformat}
 There is a counter-case mentioned as for the latter; 
 {noformat}
 // 3: Don't use in this case due to jaggedness.
 allocator-resourcesRecovered(frameworkId,  
   slaveId,  
   

[jira] [Commented] (MESOS-2618) Update C++ style guide on function definition / invocation formatting.

2015-06-22 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596770#comment-14596770
 ] 

Michael Park commented on MESOS-2618:
-

I recently did a little formatting cleanup and received some feedback that 
relates to this ticket.

In [r35635|https://reviews.apache.org/r/35635], it was noted that we prefer:

{code}
(1)
  delay(flags.executor_shutdown_grace_period,
self(),
Slave::shutdownExecutorTimeout,
framework-id(),
executor-id,
executor-containerId);
{code}

over

{code}
(2)
  delay(
  flags.executor_shutdown_grace_period,
  self(),
  Slave::shutdownExecutorTimeout,
  framework-id(),
  executor-id,
  executor-containerId);
{code}

We also prefer:

{code}
(3)
  containerizer-wait(containerId)
.onAny(defer(self(),
 Self::executorTerminated,
 frameworkId,
 executorId,
 lambda::_1));
{code}

over

{code}
(4)
  containerizer-wait(containerId)
.onAny(defer(
self(),
Self::executorTerminated,
frameworkId,
executorId,
lambda::_1));
{code}

Both of the preferred styles above are what {{clang-format}} produces. I think 
this goes to show that what {{clang-format}} produces is good in most cases (Of 
course there are short-comings since it's a developing project, but I think the 
spectacularly bad ones are the ones we should be talking about. Rather than 
cases like this where either way it's just as readable). For example, reasoning 
through and outlining the exact rules as to why we prefer (3) over (4) is 
non-trivial nor all that helpful. {{clang-format}} uses penalties for various 
undesired formatting styles and chooses a style which minimizes the total 
penalty (as per LaTeX). I think relying on that system would be more systematic 
and beneficial in terms of time-saving for all of us.

 Update C++ style guide on function definition / invocation formatting. 
 ---

 Key: MESOS-2618
 URL: https://issues.apache.org/jira/browse/MESOS-2618
 Project: Mesos
  Issue Type: Documentation
Reporter: Till Toenshoff
Priority: Minor

 Our style guide currently suggests two options for cases of function 
 definitions / invocations that do not fit into a single line even when 
 breaking after the opening argument bracket;
 Fixed leading indention (4 spaces);
 {noformat}
 // 4: OK.  
 allocator-resourcesRecovered(  
 frameworkId,  
 slaveId,  
 resources,  
 filters);
 {noformat}
 Variable leading indention;
 {noformat}
 // 3: In this case, 3 is OK.  
 foobar(someArgument,  
someOtherArgument,  
theLastArgument);
 {noformat}
 There is a counter-case mentioned as for the latter; 
 {noformat}
 // 3: Don't use in this case due to jaggedness.
 allocator-resourcesRecovered(frameworkId,  
   slaveId,  
   resources,  
   filters);
 {noformat}
 The problem here seems to be that the counter-case might not be well defined  
 on when it applies.
 We might want to consider...
 A. removing the variable leading option entirely
 B. define the exact limits on when jaggedness applies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2294) Implement the Events stream on master for Call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-2294:
-

Assignee: Anand Mazumdar

 Implement the Events stream on master for Call endpoint
 ---

 Key: MESOS-2294
 URL: https://issues.apache.org/jira/browse/MESOS-2294
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: twitter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1856) Support specifying libnl3 install location.

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596913#comment-14596913
 ] 

Marco Massenzio commented on MESOS-1856:


Hey [~rji],

it looks we're running out of time for {{0.23}} to fix this: would you mind 
terribly to write up the workaround as you suggest, so we can make it part of 
the release notes / developer docs?

Thanks!

 Support specifying libnl3 install location.
 ---

 Key: MESOS-1856
 URL: https://issues.apache.org/jira/browse/MESOS-1856
 Project: Mesos
  Issue Type: Task
Affects Versions: 0.22.0, 0.22.1
Reporter: Jie Yu

 LIBNL_CFLAGS uses a hard-coded path in the configure script, instead of 
 detecting the location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2915) Expose State API via new HTTP API

2015-06-22 Thread JIRA
Tomás Senart created MESOS-2915:
---

 Summary: Expose State API via new HTTP API
 Key: MESOS-2915
 URL: https://issues.apache.org/jira/browse/MESOS-2915
 Project: Mesos
  Issue Type: Story
Reporter: Tomás Senart






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2914) Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.

2015-06-22 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596823#comment-14596823
 ] 

Jie Yu commented on MESOS-2914:
---

https://reviews.apache.org/r/35749/
https://reviews.apache.org/r/35750/

 Port mapping isolator should cleanup unknown orphan containers after all 
 known orphan containers are recovered during recovery.
 ---

 Key: MESOS-2914
 URL: https://issues.apache.org/jira/browse/MESOS-2914
 Project: Mesos
  Issue Type: Bug
Reporter: Jie Yu
Assignee: Jie Yu

 Otherwise, the icmp/arp filter on host eth0 might be removed as a result of 
 _cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both 
 known/unknown orphan containers.
 {noformat}
 I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer
 I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 - 31607
 I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d - 41020
 I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 - 3302
 I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network 
 namespace handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 - 19805
 I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters 
 with ports [33792,34815] for container with pid 52304
 I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports 
 [33792,34816) for container with pid 52304
 I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed 
 cleanup for pid 52304
 I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery 
 complete
 I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 
 111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container 
 ddcb8397-3552-44f9-bc99-b5b69aa72944
 I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 
 8953fc7f-9fca-4931-b0cb-2f4959ddee74
 I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944
 I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 
 1.730304ms
 I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 
 50f9986f-ebbc-440d-86a7-9fa1a7c55a75
 I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74
 I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 
 1.685248ms
 I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container 
 d8c48a4a-fdfb-47dd-b8d8-07188c21600d
 I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75
 I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944
 I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 
 2.076928ms
 I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup 
 /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d
 I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery
 W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown 
 container 111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup 
 /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 
 2.12096ms
 I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 
 20150319-213133-2080910346-5050-57551-S3314
 I0612 17:46:51.56 16318 cgroups.cpp:1420] Successfully froze cgroup 
 /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 
 1.323008ms
 W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for 
 unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30
 I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling 
 '/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 
 6.9341503407days in the future
 W0612 

[jira] [Commented] (MESOS-2726) Add support for enabling network namespace without enabling the network isolator

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596908#comment-14596908
 ] 

Marco Massenzio commented on MESOS-2726:


Please take a look and update ticket.

 Add support for enabling network namespace without enabling the network 
 isolator
 

 Key: MESOS-2726
 URL: https://issues.apache.org/jira/browse/MESOS-2726
 Project: Mesos
  Issue Type: Task
  Components: isolation
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya

 Following the discussion Kapil started, it is currently not possible to 
 enable the linux network namespace for a container without enabling the 
 network isolator (which requires certain kernel capabilities and 
 dependencies).
 Following the pattern of enabling pid namespaces 
 (--isolation=namespaces/pid). One possible solution could be to add another 
 one for network i.e. namespaces/network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2726) Add support for enabling network namespace without enabling the network isolator

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2726:
---
Assignee: Kapil Arya

 Add support for enabling network namespace without enabling the network 
 isolator
 

 Key: MESOS-2726
 URL: https://issues.apache.org/jira/browse/MESOS-2726
 Project: Mesos
  Issue Type: Task
  Components: isolation
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya

 Following the discussion Kapil started, it is currently not possible to 
 enable the linux network namespace for a container without enabling the 
 network isolator (which requires certain kernel capabilities and 
 dependencies).
 Following the pattern of enabling pid namespaces 
 (--isolation=namespaces/pid). One possible solution could be to add another 
 one for network i.e. namespaces/network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2473) Failure to recover because of freezer timeout should not suggest removing meta data

2015-06-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596906#comment-14596906
 ] 

Marco Massenzio commented on MESOS-2473:


Unfortunately, it appears we won't get to do this in time for 0.23 - nothing 
happened for a couple of months and it's not critical for release.

 Failure to recover because of freezer timeout should not suggest removing 
 meta data
 ---

 Key: MESOS-2473
 URL: https://issues.apache.org/jira/browse/MESOS-2473
 Project: Mesos
  Issue Type: Improvement
  Components: isolation
Affects Versions: 0.22.0
Reporter: Ian Downes
  Labels: twitter

 A more appropriate action should be suggested, e.g., manually kill the 
 processes in cgroup xxx because the slave will still attempt to clean up 
 orphans and hit the same code path.
 {noformat}
 I0310 23:04:23.961019 32342 slave.cpp:3321] Current usage 35.87%. Max allowed 
 age: 3.789365411204225days
 Failed to perform recovery: Collect failed: Timed out after 1mins
 To remedy this do as follows:
 Step 1: rm -f /var/lib/mesos/meta/slaves/latest
 This ensures slave doesn't recover old live executors.
 Step 2: Restart the slave.
 Slave Exit Status: 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-06-22 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596949#comment-14596949
 ] 

Kapil Arya commented on MESOS-2226:
---

Created a RR to handle the issue: https://reviews.apache.org/r/35756/

The current understanding is that the failure was due the race introduced
by the code not checking for TASK_RUNNING status update message
from the MockExecutor before stopping the scheduler driver. This caused
the Executor to be terminated prematurely (before the tasks were
launched) and thus the remove-executor hook was never called.

The fix was to wait for the TASK_RUNNING status update and then wait
for the shutdown() within MockExecutor. Only then we wait for the future
from remove-executor hook.

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Kapil Arya
  Labels: flaky, flaky-test, mesosphere

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with 

[jira] [Commented] (MESOS-2909) Add version field to RegisterFrameworkMessage and ReregisterFrameworkMessage

2015-06-22 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596891#comment-14596891
 ] 

Benjamin Mahler commented on MESOS-2909:


I have a patch for this, but will hold off until we have the handlers fully 
implemented on the driver / library side (linked in blocking tickets, but more 
will follow). Adding it after the handlers are functional allows the master to 
assume that a present version means events can be sent.

 Add version field to RegisterFrameworkMessage and ReregisterFrameworkMessage
 

 Key: MESOS-2909
 URL: https://issues.apache.org/jira/browse/MESOS-2909
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Benjamin Mahler

 In the same way we added 'version' field to RegisterSlaveMessage and 
 ReregisterSlaveMessage, we should do it framework (re-)registration messages. 
 This would help master determine which version of scheduler driver it is 
 talking to.
 We want this so that master can start sending Event messages to the scheduler 
 driver (and scheduler library). In the long term, master will send a 
 streaming response to the libraries, but in the meantime we can test the 
 event protobufs by sending Event messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-1988:
--
Fix Version/s: 0.24.0

 Scheduler driver should not generate TASK_LOST when disconnected from master
 

 Key: MESOS-1988
 URL: https://issues.apache.org/jira/browse/MESOS-1988
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Anand Mazumdar
  Labels: mesosphere, twitter
 Fix For: 0.24.0


 Currently, the driver replies to launchTasks() with TASK_LOST if it detects 
 that it is disconnected from the master. After MESOS-1972 lands, this will be 
 the only place where driver generates TASK_LOST. See MESOS-1972 for more 
 context.
 This fix is targeted for 0.22.0 to give frameworks time to implement 
 reconciliation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1571) Signal escalation timeout is not configurable

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-1571:
---
Target Version/s:   (was: 0.23.0)

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
  Labels: mesosphere

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2725) undocumented how to enable pid namespace isolation and shared filesystem isolation

2015-06-22 Thread Marco Massenzio (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Massenzio updated MESOS-2725:
---
Target Version/s:   (was: 0.23.0)

 undocumented how to enable pid namespace isolation and shared filesystem 
 isolation
 --

 Key: MESOS-2725
 URL: https://issues.apache.org/jira/browse/MESOS-2725
 Project: Mesos
  Issue Type: Documentation
  Components: containerization, documentation
Reporter: Adam Tulinius

 http://mesos.apache.org/documentation/latest/mesos-containerizer/ doesn't 
 actually mention how to enable shared filesystem- and pid namespace isolation.
 I'll suggest adding something like:
 To enable the Shared Filesystem isolator, append filesystem/shared to the 
 --isolation flag when starting the slave.
 .. and:
 To enable the Pid Namespace isolator, append namespaces/pid to the 
 --isolation flag when starting the slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources

2015-06-22 Thread Elizabeth Lingg (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596807#comment-14596807
 ] 

Elizabeth Lingg commented on MESOS-1807:


For me, the issue is that Chronos and Marathon, for example, currently launch 
custom executors with 0 cpu AND 0 memory. While this needs to be fixed in 
Chronos and Marathon, a full announcement, warning, deprecation cycle would be 
appropriate in my view.  

I do agree that custom executors need to specify both CPU and Memory.

 Disallow executors with cpu only or memory only resources
 -

 Key: MESOS-1807
 URL: https://issues.apache.org/jira/browse/MESOS-1807
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
  Labels: newbie

 Currently master allows executors to be launched with either only cpus or 
 only memory but we shouldn't allow that.
 This is because executor is an actual unix process that is launched by the 
 slave. If an executor doesn't specify cpus, what should do the cpu limits be 
 for that executor when there are no tasks running on it? If no cpu limits are 
 set then it might starve other executors/tasks on the slave violating 
 isolation guarantees. Same goes with memory. Moreover, the current 
 containerizer/isolator code will throw failures when using such an executor, 
 e.g., when the last task on the executor finishes and Containerizer::update() 
 is called with 0 cpus or 0 mem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint

2015-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2296:
--
Assignee: (was: Anand Mazumdar)

 Implement the Events stream on slave for Call endpoint
 --

 Key: MESOS-2296
 URL: https://issues.apache.org/jira/browse/MESOS-2296
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
  Labels: mesosphere





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-2726) Add support for enabling network namespace without enabling the network isolator

2015-06-22 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya closed MESOS-2726.
-
Resolution: Duplicate

 Add support for enabling network namespace without enabling the network 
 isolator
 

 Key: MESOS-2726
 URL: https://issues.apache.org/jira/browse/MESOS-2726
 Project: Mesos
  Issue Type: Task
  Components: isolation
Reporter: Niklas Quarfot Nielsen
Assignee: Kapil Arya

 Following the discussion Kapil started, it is currently not possible to 
 enable the linux network namespace for a container without enabling the 
 network isolator (which requires certain kernel capabilities and 
 dependencies).
 Following the pattern of enabling pid namespaces 
 (--isolation=namespaces/pid). One possible solution could be to add another 
 one for network i.e. namespaces/network.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-2199:
---

Assignee: haosdent

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595801#comment-14595801
 ] 

haosdent commented on MESOS-2199:
-

The patch: https://reviews.apache.org/r/35728/diff

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595489#comment-14595489
 ] 

haosdent commented on MESOS-2199:
-

Thank you for your explain, let me try to add it.

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code

2015-06-22 Thread Colin Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Williams updated MESOS-2637:
--
Description: We are using 'foo', 'bar', ... string constants and pairs in 
src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp 
and src/examples/test_hook_module.cpp for label and hooks tests. These values 
should be stored in local variables to avoid the possibility of assignment 
getting out of sync with checking for that same value.  (was: We are using 
'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, 
src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and 
src/examples/test_hook_module.cpp for label and hooks tests. We should 
consolidate them to make the call sites less prone to forgetting to update all 
call sites.)

 Consolidate 'foo', 'bar', ... string constants in test and example code
 ---

 Key: MESOS-2637
 URL: https://issues.apache.org/jira/browse/MESOS-2637
 Project: Mesos
  Issue Type: Bug
  Components: technical debt
Reporter: Niklas Quarfot Nielsen
Assignee: Colin Williams

 We are using 'foo', 'bar', ... string constants and pairs in 
 src/tests/master_tests.cpp, src/tests/slave_tests.cpp, 
 src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and 
 hooks tests. These values should be stored in local variables to avoid the 
 possibility of assignment getting out of sync with checking for that same 
 value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595487#comment-14595487
 ] 

Adam B commented on MESOS-2199:
---

Nice detective work. We cannot require special ordering of tests. Each unit 
test should work in isolation, and they should all pass even with 
--gtest_shuffle enabled. We may indeed need some pre/post test steps in 
SetUp/TearDown methods.

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595472#comment-14595472
 ] 

haosdent commented on MESOS-2199:
-

But when you run SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser first, and 
then run SlaveTest.ROOT_RunTaskWithCommandInfoWithUser. It could pass because 
{code}build/src/.libs/lt-mesos-executor{code} have already create in 
SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser.

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-2199:

Comment: was deleted

(was: I replace nobody to dbus, the test case could pass. But when I use 
nobody, it failed. )

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595473#comment-14595473
 ] 

haosdent commented on MESOS-2199:
-

So do we need some prepare steps in 
SlaveTest.ROOT_RunTaskWithCommandInfoWithUser? Or just keep current status? 
For jenkins, I think it could pass because 
SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser is running before 
SlaveTest.ROOT_RunTaskWithCommandInfoWithUser. [~adam-mesos] [~idownes] 
[~nnielsen]

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser

2015-06-22 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595471#comment-14595471
 ] 

haosdent commented on MESOS-2199:
-

When run this test case, in build/src/mesos-executor, it would check the 
build/src/.libs/lt-mesos-executor exist or not.
{code}
program=lt-'mesos-executor'
progdir=$thisdir/.libs

if test ! -f $progdir/$program ||
   { file=`ls -1dt $progdir/$program $progdir/../$program 2/dev/null | 
/bin/sed 1q`; \
 test X$file != X$progdir/$program; }; then
{code}

When lt-mesos-executor is not exist, it would try to compile it.
{code}
relink_command=(...
{code}

But mesos-executor is run under no-root user, so when compile it, it would 
failed. Finally it could reproduce the problem like that.

 Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ---

 Key: MESOS-2199
 URL: https://issues.apache.org/jira/browse/MESOS-2199
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Ian Downes
Assignee: haosdent
  Labels: mesosphere

 Appears that running the executor as {{nobody}} is not supported.
 [~nnielsen] can you take a look?
 Executor log:
 {noformat}
 [root@hostname build]# cat 
 /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60
 487-11862-/executors/1/runs/latest/std*
 sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied
 {noformat}
 Test output:
 {noformat}
 [==] Running 1 test from 1 test case.
 [--] Global test environment set-up.
 [--] 1 test from SlaveTest
 [ RUN  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
 ../../src/tests/slave_tests.cpp:680: Failure
 Value of: statusRunning.get().state()
   Actual: TASK_FAILED
 Expected: TASK_RUNNING
 ../../src/tests/slave_tests.cpp:682: Failure
 Failed to wait 10secs for statusFinished
 ../../src/tests/slave_tests.cpp:673: Failure
 Actual function call count doesn't match EXPECT_CALL(sched, 
 statusUpdate(driver, _))...
  Expected: to be called twice
Actual: called once - unsatisfied and active
 [  FAILED  ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms)
 [--] 1 test from SlaveTest (10641 ms total)
 [--] Global test environment tear-down
 [==] 1 test from 1 test case ran. (10658 ms total)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >