[jira] [Updated] (MESOS-2293) Implement the Call endpoint on master
[ https://issues.apache.org/jira/browse/MESOS-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2293: --- Sprint: Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13 (was: Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12) Implement the Call endpoint on master - Key: MESOS-2293 URL: https://issues.apache.org/jira/browse/MESOS-2293 Project: Mesos Issue Type: Story Reporter: Vinod Kone Assignee: Isabel Jimenez Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2545) Developer guide for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2545: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) Developer guide for libprocess -- Key: MESOS-2545 URL: https://issues.apache.org/jira/browse/MESOS-2545 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Labels: documentation, mesosphere Create a developer guide for libprocess that explains the philosophy behind it and explains the most important features as well as the prevalent use patterns in Mesos with examples. This could be similar to stout/README.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2497) Create synchronous validations for Calls
[ https://issues.apache.org/jira/browse/MESOS-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2497: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) Create synchronous validations for Calls Key: MESOS-2497 URL: https://issues.apache.org/jira/browse/MESOS-2497 Project: Mesos Issue Type: Bug Reporter: Isabel Jimenez Assignee: Isabel Jimenez Labels: HTTP, mesosphere /call endpoint will return a 202 accepted code but has to do some basic validations before. In case of invalidation it will return a 4xx code. We have to create a mechanism that will validate the 'request' and send back the appropriate code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2888) Add SSL socket tests
[ https://issues.apache.org/jira/browse/MESOS-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2888: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) Add SSL socket tests Key: MESOS-2888 URL: https://issues.apache.org/jira/browse/MESOS-2888 Project: Mesos Issue Type: Improvement Components: libprocess Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Labels: libprocess, ssl, tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2719) Removing '.json' extension in master endpoints url
[ https://issues.apache.org/jira/browse/MESOS-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2719: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) Removing '.json' extension in master endpoints url -- Key: MESOS-2719 URL: https://issues.apache.org/jira/browse/MESOS-2719 Project: Mesos Issue Type: Improvement Reporter: Isabel Jimenez Assignee: Isabel Jimenez Labels: HTTP, mesosphere Remove the '.json' extension on endpoints such as `/master/stats.json` so it become `/master/stats` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2119) Add Socket tests
[ https://issues.apache.org/jira/browse/MESOS-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2119: --- Sprint: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13 (was: Mesosphere Q4 Sprint 3 - 12/7, Mesosphere Q1 Sprint 1 - 1/23, Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12) Add Socket tests Key: MESOS-2119 URL: https://issues.apache.org/jira/browse/MESOS-2119 Project: Mesos Issue Type: Task Components: libprocess Reporter: Niklas Quarfot Nielsen Assignee: Joris Van Remoortere Labels: mesosphere Add more Socket specific tests to get coverage while doing libev to libevent (w and wo SSL) move -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2873) style hook prevent's valid markdown files from getting committed
[ https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596041#comment-14596041 ] Alexander Rojas edited comment on MESOS-2873 at 6/22/15 3:36 PM: - Hey [~marco-mesos], I didn't set it as reviewable, since it wasn't accepted, and AFAIK we have to wait for the Accepted to set it to reviewable. However, I already had a fix. I wonder what the standard procedure is then? Hold the patch until it is accepted? publish it and set it to reviewable even if it was never accepted? was (Author: arojas): With all due respect [~marco-mesos], I didn't set it as reviewable, since it wasn't accepted. But I already had a fix. Should I then sit down and cry until someone decided to accept it before I either open the ticket or publish the patch? style hook prevent's valid markdown files from getting committed Key: MESOS-2873 URL: https://issues.apache.org/jira/browse/MESOS-2873 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Alexander Rojas Priority: Trivial Labels: mesosphere Fix For: 0.23.0 According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{br/}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2600) Add /reserve and /unreserve endpoints on the master for dynamic reservation
[ https://issues.apache.org/jira/browse/MESOS-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2600: --- Sprint: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13 (was: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12) Add /reserve and /unreserve endpoints on the master for dynamic reservation --- Key: MESOS-2600 URL: https://issues.apache.org/jira/browse/MESOS-2600 Project: Mesos Issue Type: Task Components: master Reporter: Michael Park Assignee: Michael Park Priority: Critical Labels: mesosphere Enable operators to manage dynamic reservations by Introducing the {{/reserve}} and {{/unreserve}} HTTP endpoints on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2394) Create styleguide for documentation
[ https://issues.apache.org/jira/browse/MESOS-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2394: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) Create styleguide for documentation --- Key: MESOS-2394 URL: https://issues.apache.org/jira/browse/MESOS-2394 Project: Mesos Issue Type: Documentation Reporter: Joerg Schad Assignee: Joerg Schad Priority: Minor Labels: mesosphere As of right now different pages in our documentation use quite different styles. Consider for example the different emphasis for NOTE: * {noformat} NOTE: http://mesos.apache.org/documentation/latest/slave-recovery/{noformat} * {noformat}*NOTE*: http://mesos.apache.org/documentation/latest/upgrades/ {noformat} Would be great to establish a common style for the documentation! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2073) Fetcher cache file verification, updating and invalidation
[ https://issues.apache.org/jira/browse/MESOS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2073: --- Sprint: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13 (was: Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12) Fetcher cache file verification, updating and invalidation -- Key: MESOS-2073 URL: https://issues.apache.org/jira/browse/MESOS-2073 Project: Mesos Issue Type: Improvement Components: fetcher, slave Reporter: Bernd Mathiske Assignee: Bernd Mathiske Priority: Minor Labels: mesosphere Original Estimate: 96h Remaining Estimate: 96h The other tickets in the fetcher cache epic do not necessitate a check sum (e.g. MD5, SHA*) for files cached by the fetcher. Whereas such a check sum could be used to verify whether the file arrived without unintended alterations, it can first and foremost be employed to detect and trigger updates. Scenario: If a UIR is requested for fetching and the indicated download has the same check sum as the cached file, then the cache file will be used and the download forgone. If the check sum is different, then fetching proceeds and the cached file gets replaced. This capability will be indicated by an additional field in the URI protobuf. Details TBD, i.e. to be discussed in comments below. In addition to the above, even if the check sum is the same, we can support voluntary cache file invalidation: a fresh download can be requested, or the caching behavior can be revoked entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2200) bogus docker images result in bad error message to scheduler
[ https://issues.apache.org/jira/browse/MESOS-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2200: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) bogus docker images result in bad error message to scheduler Key: MESOS-2200 URL: https://issues.apache.org/jira/browse/MESOS-2200 Project: Mesos Issue Type: Bug Components: containerization, docker Reporter: Jay Buffington Assignee: Joerg Schad Labels: mesosphere When a scheduler specifies a bogus image in ContainerInfo mesos doesn't tell the scheduler that the docker pull failed or why. This error is logged in the mesos-slave log, but it isn't given to the scheduler (as far as I can tell): {noformat} E1218 23:50:55.406230 8123 slave.cpp:2730] Container '8f70784c-3e40-4072-9ca2-9daed23f15ff' for executor 'thermos-1418946354013-xxx-xxx-curl-0-f500cc41-dd0a-4338-8cbc-d631cb588bb1' of framework '20140522-213145-1749004561-5050-29512-' failed to start: Failed to 'docker pull docker-registry.example.com/doesntexist/hello1.1:latest': exit status = exited with status 1 stderr = 2014/12/18 23:50:55 Error: image doesntexist/hello1.1 not found {noformat} If the docker image is not in the registry, the scheduler should give the user an error message. If docker pull failed because of networking issues, it should be retried. Mesos should give the scheduler enough information to be able to make that decision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2166) PerfEventIsolatorTest.ROOT_CGROUPS_Sample requires 'perf' to be installed
[ https://issues.apache.org/jira/browse/MESOS-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2166: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) PerfEventIsolatorTest.ROOT_CGROUPS_Sample requires 'perf' to be installed -- Key: MESOS-2166 URL: https://issues.apache.org/jira/browse/MESOS-2166 Project: Mesos Issue Type: Bug Reporter: Cody Maloney Assignee: Isabel Jimenez Labels: mesosphere The perf::valid() relies on the 'perf' command being installed. This isn't always the case. Configure should probably check for the perf command exists. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2157) Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints
[ https://issues.apache.org/jira/browse/MESOS-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596301#comment-14596301 ] Marco Massenzio commented on MESOS-2157: is this still being worked on? In other words, should I move it Sprint 13? Or should we Stop progress'? Thanks! Add /master/slaves and /master/frameworks/{framework}/tasks/{task} endpoints Key: MESOS-2157 URL: https://issues.apache.org/jira/browse/MESOS-2157 Project: Mesos Issue Type: Task Components: master Reporter: Niklas Quarfot Nielsen Assignee: Alexander Rojas Priority: Trivial Labels: mesosphere, newbie master/state.json exports the entire state of the cluster and can, for large clusters, become massive (tens of megabytes of JSON). Often, a client only need information about subsets of the entire state, for example all connected slaves, or information (registration info, tasks, etc) belonging to a particular framework. We can partition state.json into many smaller endpoints, but for starters, being able to get slave information and tasks information per framework would be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1815) Create a guide to becoming a committer
[ https://issues.apache.org/jira/browse/MESOS-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-1815: --- Sprint: Mesosphere Sprint 13 Create a guide to becoming a committer -- Key: MESOS-1815 URL: https://issues.apache.org/jira/browse/MESOS-1815 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Dominic Hamon Assignee: Bernd Mathiske Labels: mesosphere We have a committer's guide, but the process by which one becomes a committer is unclear. We should set some guidelines and a process by which we can grow contributors into committers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed
[ https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596098#comment-14596098 ] Marco Massenzio commented on MESOS-2873: Hey Alex - no worries, if you do decide to work on something, then it's Accepted by definition :) There is no special committee or super-power needed - we trust your judgement: if you think it's worth doing, then, by all means mark it as Accepted (and then, when working on it, as In Progress and the Reviewable as appropriate). Thanks! style hook prevent's valid markdown files from getting committed Key: MESOS-2873 URL: https://issues.apache.org/jira/browse/MESOS-2873 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Alexander Rojas Priority: Trivial Labels: mesosphere Fix For: 0.23.0 According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{br/}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2295) Implement the Call endpoint on Slave
[ https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2295: --- Story Points: 8 Implement the Call endpoint on Slave Key: MESOS-2295 URL: https://issues.apache.org/jira/browse/MESOS-2295 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed
[ https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596093#comment-14596093 ] haosdent commented on MESOS-2873: - I think open-reviewable should be acceptable. style hook prevent's valid markdown files from getting committed Key: MESOS-2873 URL: https://issues.apache.org/jira/browse/MESOS-2873 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Alexander Rojas Priority: Trivial Labels: mesosphere Fix For: 0.23.0 According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{br/}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2855) Update operational guide to include growing from standalone to high availability
[ https://issues.apache.org/jira/browse/MESOS-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596270#comment-14596270 ] Vinod Kone commented on MESOS-2855: --- See next steps here: http://mesos.apache.org/documentation/latest/submitting-a-patch/ Once submitted, please paste the review url here and change the status of this ticket to Reviewable. Thanks Update operational guide to include growing from standalone to high availability Key: MESOS-2855 URL: https://issues.apache.org/jira/browse/MESOS-2855 Project: Mesos Issue Type: Documentation Reporter: Michael Schenck Assignee: Michael Schenck Labels: documentation The [Operational Guide|http://mesos.apache.org/documentation/latest/operational-guide/] covers increasing quorum size from {{--quorum=2}}, but does not cover how to move from a _standalone_ master to a high availability configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bernd Mathiske resolved MESOS-2858. --- Resolution: Fixed https://reviews.apache.org/r/35438/ FetcherCacheHttpTest.HttpMixed is flaky. Key: MESOS-2858 URL: https://issues.apache.org/jira/browse/MESOS-2858 Project: Mesos Issue Type: Bug Components: fetcher, test Reporter: Benjamin Mahler Assignee: Bernd Mathiske Labels: flaky-test, mesosphere From jenkins: {noformat} [ RUN ] FetcherCacheHttpTest.HttpMixed Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC' I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 2112ns I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the db in 392ns I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from a replica in EMPTY status I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to STARTING I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 590673ns I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to STARTING I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status I0611 00:40:28.214774 26061 master.cpp:363] Master 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 172.17.0.116:33349 I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls= --allocation_interval=1secs --allocator=HierarchicalDRF --authenticate=true --authenticate_slaves=true --authenticators=crammd5 --credentials=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials --framework_sorter=drf --help=false --initialize_driver_logging=true --log_auto_initialize=true --logbufsecs=0 --logging_level=INFO --quiet=false --recovery_slave_removal_limit=100% --registry=replicated_log --registry_fetch_timeout=1mins --registry_store_timeout=25secs --registry_strict=true --root_submissions=true --slave_reregister_timeout=10mins --user_sorter=drf --version=false --webui_dir=/mesos/mesos-0.23.0/_inst/share/mesos/webui --work_dir=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master --zk_session_timeout=10secs I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing authenticated frameworks to register I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing authenticated slaves to register I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials' I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' authenticator I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from a replica in STARTING status I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical allocator process I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 374189ns I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to VOTING I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos group I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042 I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master! I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer I0611 00:40:28.219391 26067 replica.cpp:477] Replica received implicit promise request with proposal 1 I0611 00:40:28.219696 26067 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 276905ns I0611 00:40:28.219720 26067
[jira] [Commented] (MESOS-2295) Implement the Call endpoint on Slave
[ https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596305#comment-14596305 ] Marco Massenzio commented on MESOS-2295: I would really like to see this story broken down in smaller chunks of linked tasks. Implement the Call endpoint on Slave Key: MESOS-2295 URL: https://issues.apache.org/jira/browse/MESOS-2295 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2555) Document problem/solution of MESOS-2419 in documentation.
[ https://issues.apache.org/jira/browse/MESOS-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2555: --- Sprint: Mesosphere Sprint 13 (was: Mesosphere Sprint 12) Document problem/solution of MESOS-2419 in documentation. - Key: MESOS-2555 URL: https://issues.apache.org/jira/browse/MESOS-2555 Project: Mesos Issue Type: Documentation Reporter: Joerg Schad Assignee: Joerg Schad Priority: Critical Labels: mesosphere As the problem encountered in MESOS-2419 is a common problem with the default systemd configuration it would make sense to document this in the upgrade guide or somewhere else in the documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Brett updated MESOS-2903: -- Story Points: 3 (was: 2) Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2294) Implement the Events stream on master for Call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2294: -- Sprint: Twitter Mesos Q2 Sprint 6 Story Points: 8 Implement the Events stream on master for Call endpoint --- Key: MESOS-2294 URL: https://issues.apache.org/jira/browse/MESOS-2294 Project: Mesos Issue Type: Task Reporter: Vinod Kone Labels: twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2903: -- Sprint: Twitter Mesos Q2 Sprint 6 Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2619) Document master-scheduler communication
[ https://issues.apache.org/jira/browse/MESOS-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596492#comment-14596492 ] Marco Massenzio commented on MESOS-2619: Not sure why this is under the {{HTTP API}} Epic - the reason it has the {{mesosphere}} label is (I'm guessing here) because [~cdoyle] reported it. No one is currently working on this, as far as I know. Document master-scheduler communication --- Key: MESOS-2619 URL: https://issues.apache.org/jira/browse/MESOS-2619 Project: Mesos Issue Type: Bug Components: documentation Affects Versions: 0.22.0 Reporter: Connor Doyle Labels: mesosphere New users often stumble on the networking requirements for communication between schedulers and the Mesos master. It's not explicitly stated anywhere that the master has to talk back to the scheduler. Also, some configuration options (like the LIBPROCESS_PORT environment variable) are under-documented. This problem is exacerbated as many new users start playing with Mesos and scheduers in unpredictable networking contexts (NAT, containers with bridged networking, etc.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master
[ https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596491#comment-14596491 ] Vinod Kone commented on MESOS-1988: --- [~anandmazumdar] Can you send that email today? Feel free to run it by me, if you need another pair of eyes. Scheduler driver should not generate TASK_LOST when disconnected from master Key: MESOS-1988 URL: https://issues.apache.org/jira/browse/MESOS-1988 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: mesosphere, twitter Currently, the driver replies to launchTasks() with TASK_LOST if it detects that it is disconnected from the master. After MESOS-1972 lands, this will be the only place where driver generates TASK_LOST. See MESOS-1972 for more context. This fix is targeted for 0.22.0 to give frameworks time to implement reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2907: --- Sprint: Mesosphere Sprint 13 Slave : Create Basic Functionality to handle /call endpoint --- Key: MESOS-2907 URL: https://issues.apache.org/jira/browse/MESOS-2907 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Assignee: Anand Mazumdar Labels: HTTP, mesosphere This is the first basic step in ensuring the basic /call functionality: processing a POST /call and returning: 202 if all goes well; 401 if not authorized; and 403 if the request is malformed. Also , we might need to store some identifier which enables us to reject calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2907: --- Story Points: 5 Slave : Create Basic Functionality to handle /call endpoint --- Key: MESOS-2907 URL: https://issues.apache.org/jira/browse/MESOS-2907 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Assignee: Anand Mazumdar Labels: HTTP, mesosphere This is the first basic step in ensuring the basic /call functionality: processing a POST /call and returning: 202 if all goes well; 401 if not authorized; and 403 if the request is malformed. Also , we might need to store some identifier which enables us to reject calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2907: --- Sprint: (was: Mesosphere Sprint 13) Slave : Create Basic Functionality to handle /call endpoint --- Key: MESOS-2907 URL: https://issues.apache.org/jira/browse/MESOS-2907 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Assignee: Anand Mazumdar Labels: HTTP, mesosphere This is the first basic step in ensuring the basic /call functionality: processing a POST /call and returning: 202 if all goes well; 401 if not authorized; and 403 if the request is malformed. Also , we might need to store some identifier which enables us to reject calls to /call if the client has not issued a SUBSCRIBE/RESUBSCRIBE Request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2831) FetcherCacheTest.SimpleEviction is flaky
[ https://issues.apache.org/jira/browse/MESOS-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596030#comment-14596030 ] Bernd Mathiske commented on MESOS-2831: --- Yep. Should be fixed now. Thx. FetcherCacheTest.SimpleEviction is flaky Key: MESOS-2831 URL: https://issues.apache.org/jira/browse/MESOS-2831 Project: Mesos Issue Type: Bug Components: fetcher Affects Versions: 0.23.0 Reporter: Vinod Kone Assignee: Bernd Mathiske Labels: flaky-test, mesosphere Saw this when reviewbot was testing an unrelated review https://reviews.apache.org/r/35119/ {code} [ RUN ] FetcherCacheTest.SimpleEviction GMOCK WARNING: Uninteresting mock function call - returning directly. Function call: resourceOffers(0x5365320, @0x2b7bef9f1b20 { 128-byte object B0-C0 36-E6 7B-2B 00-00 00-00 00-00 00-00 00-00 20-75 00-18 7C-2B 00-00 C0-75 00-18 7C-2B 00-00 60-76 00-18 7C-2B 00-00 00-77 00-18 7C-2B 00-00 40-3A 00-18 7C-2B 00-00 04-00 00-00 04-00 00-00 04-00 00-00 7C-2B 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 0F-00 00-00 }) Stack trace: F0607 21:19:23.181392 4246 fetcher_cache_tests.cpp:354] CHECK_READY(offers): is PENDING Failed to wait for resource offers *** Check failure stack trace: *** @ 0x2b7be56c5972 google::LogMessage::Fail() @ 0x2b7be56c58be google::LogMessage::SendToLog() @ 0x2b7be56c52c0 google::LogMessage::Flush() @ 0x2b7be56c81d4 google::LogMessageFatal::~LogMessageFatal() @ 0x97d182 _CheckFatal::~_CheckFatal() @ 0xb58a28 mesos::internal::tests::FetcherCacheTest::launchTask() @ 0xb65b50 mesos::internal::tests::FetcherCacheTest_SimpleEviction_Test::TestBody() @ 0x11923b7 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x118d5b4 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x1175975 testing::Test::Run() @ 0x1176098 testing::TestInfo::Run() @ 0x1176620 testing::TestCase::Run() @ 0x117b2ea testing::internal::UnitTestImpl::RunAllTests() @ 0x1193229 testing::internal::HandleSehExceptionsInMethodIfSupported() @ 0x118e2a5 testing::internal::HandleExceptionsInMethodIfSupported() @ 0x117a1f6 testing::UnitTest::Run() @ 0xcc832b main @ 0x2b7be7d46ec5 (unknown) @ 0x872379 (unknown) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2905) JVM crashed when wrong master format specified
[ https://issues.apache.org/jira/browse/MESOS-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596033#comment-14596033 ] haosdent commented on MESOS-2905: - Hi, [~zborisha] , I think you problem is fixed in https://issues.apache.org/jira/browse/MESOS-2636 JVM crashed when wrong master format specified -- Key: MESOS-2905 URL: https://issues.apache.org/jira/browse/MESOS-2905 Project: Mesos Issue Type: Bug Components: general Affects Versions: 0.22.1 Environment: java version 1.8.0_45 - Oracle mesos version 0.22.1 OS ubuntu 15.04 Reporter: Borisa Zivkovic I am using Spark with Mesos... I reported issue here https://issues.apache.org/jira/browse/SPARK-8524 but actually after inspecting core dump it looks like it is Mesos problem Basically, if I specify invalid mesos master URL it crashes JVM... for example mesos://http://127.0.0.1:5050 and mesos://abc://127.0.0.1:5050 will crash JVM Looks like problem is in line 245 here https://github.com/apache/mesos/blob/master/src/master/detector.cpp probably additional checks should be done and proper error reported instead of crashing JVM here is relevant part of core dump [Thread debugging using libthread_db enabled] Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1. gdb where Core was generated by `/usr/lib/jvm/java-8-oracle/bin/java -cp /home/borisa/Programs/spark-1.4.0-bin-h'. Program terminated with signal SIGABRT, Aborted. #0 0x7f8a245ad267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 55../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) gdb where Undefined command: gdb. Try help. (gdb) where #0 0x7f8a245ad267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x7f8a245aeeca in __GI_abort () at abort.c:89 #2 0x7f8a23ebf6b5 in os::abort(bool) () from /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so #3 0x7f8a2405cda3 in VMError::report_and_die() () from /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so #4 0x7f8a23ec4bdf in JVM_handle_linux_signal () from /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so #5 0x7f8a23ebb493 in signalHandler(int, siginfo*, void*) () from /usr/lib/jvm/java-8-oracle/jre/lib/amd64/server/libjvm.so #6 signal handler called #7 __GI_freeaddrinfo (ai=0x998ef7a53c7f3000) at ../sysdeps/posix/getaddrinfo.c:2683 #8 0x7f89869b4b10 in getIP () at ../../../3rdparty/libprocess/3rdparty/stout/include/stout/net.hpp:203 #9 0x7f89869f2d9e in operator () at ../../../3rdparty/libprocess/src/pid.cpp:114 #10 0x7f89869f2768 in UPID () at ../../../3rdparty/libprocess/src/pid.cpp:43 #11 0x7f8986075108 in create () at ../../src/master/detector.cpp:245 #12 0x7f89862768c4 in start () at ../../src/sched/sched.cpp:1515 #13 0x7f8986a97418 in Java_org_apache_mesos_MesosSchedulerDriver_start () at ../../src/java/jni/org_apache_mesos_MesosSchedulerDriver.cpp:603 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2873) style hook prevent's valid markdown files from getting committed
[ https://issues.apache.org/jira/browse/MESOS-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596041#comment-14596041 ] Alexander Rojas commented on MESOS-2873: With all due respect [~marco-mesos], I didn't set it as reviewable, since it wasn't accepted. But I already had a fix. Should I then sit down and cry until someone decided to accept it before I either open the ticket or publish the patch? style hook prevent's valid markdown files from getting committed Key: MESOS-2873 URL: https://issues.apache.org/jira/browse/MESOS-2873 Project: Mesos Issue Type: Bug Reporter: Alexander Rojas Assignee: Alexander Rojas Priority: Trivial Labels: mesosphere Fix For: 0.23.0 According to the original [markdown specification|http://daringfireball.net/projects/markdown/syntax#p] and to the most [recent standarization|http://spec.commonmark.org/0.20/#hard-line-breaks] effort, two spaces at the end of a line create a hard line break (it breaks the line without starting a new paragraph), similar to the html code {{br/}}. However, there's a hook in mesos which prevent files with trailing whitespace to be committed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2858) FetcherCacheHttpTest.HttpMixed is flaky.
[ https://issues.apache.org/jira/browse/MESOS-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596048#comment-14596048 ] Bernd Mathiske commented on MESOS-2858: --- Yes, this looks like the same problem that just got fixed by this: https://reviews.apache.org/r/35438/ FetcherCacheHttpTest.HttpMixed is flaky. Key: MESOS-2858 URL: https://issues.apache.org/jira/browse/MESOS-2858 Project: Mesos Issue Type: Bug Components: fetcher, test Reporter: Benjamin Mahler Assignee: Bernd Mathiske Labels: flaky-test, mesosphere From jenkins: {noformat} [ RUN ] FetcherCacheHttpTest.HttpMixed Using temporary directory '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC' I0611 00:40:28.208909 26042 leveldb.cpp:176] Opened db in 3.831173ms I0611 00:40:28.209951 26042 leveldb.cpp:183] Compacted db in 997319ns I0611 00:40:28.210011 26042 leveldb.cpp:198] Created db iterator in 23917ns I0611 00:40:28.210032 26042 leveldb.cpp:204] Seeked to beginning of db in 2112ns I0611 00:40:28.210043 26042 leveldb.cpp:273] Iterated through 0 keys in the db in 392ns I0611 00:40:28.210095 26042 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0611 00:40:28.210741 26067 recover.cpp:449] Starting replica recovery I0611 00:40:28.211144 26067 recover.cpp:475] Replica is in EMPTY status I0611 00:40:28.212210 26074 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0611 00:40:28.212728 26071 recover.cpp:195] Received a recover response from a replica in EMPTY status I0611 00:40:28.213260 26069 recover.cpp:566] Updating replica status to STARTING I0611 00:40:28.214066 26073 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 590673ns I0611 00:40:28.214095 26073 replica.cpp:323] Persisted replica status to STARTING I0611 00:40:28.214350 26073 recover.cpp:475] Replica is in STARTING status I0611 00:40:28.214774 26061 master.cpp:363] Master 20150611-004028-1946161580-33349-26042 (658ddc752264) started on 172.17.0.116:33349 I0611 00:40:28.214800 26061 master.cpp:365] Flags at startup: --acls= --allocation_interval=1secs --allocator=HierarchicalDRF --authenticate=true --authenticate_slaves=true --authenticators=crammd5 --credentials=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials --framework_sorter=drf --help=false --initialize_driver_logging=true --log_auto_initialize=true --logbufsecs=0 --logging_level=INFO --quiet=false --recovery_slave_removal_limit=100% --registry=replicated_log --registry_fetch_timeout=1mins --registry_store_timeout=25secs --registry_strict=true --root_submissions=true --slave_reregister_timeout=10mins --user_sorter=drf --version=false --webui_dir=/mesos/mesos-0.23.0/_inst/share/mesos/webui --work_dir=/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/master --zk_session_timeout=10secs I0611 00:40:28.215342 26061 master.cpp:410] Master only allowing authenticated frameworks to register I0611 00:40:28.215361 26061 master.cpp:415] Master only allowing authenticated slaves to register I0611 00:40:28.215397 26061 credentials.hpp:37] Loading credentials for authentication from '/tmp/FetcherCacheHttpTest_HttpMixed_qfpOOC/credentials' I0611 00:40:28.215589 26064 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0611 00:40:28.215770 26061 master.cpp:454] Using default 'crammd5' authenticator I0611 00:40:28.215934 26061 master.cpp:491] Authorization enabled I0611 00:40:28.215932 26062 recover.cpp:195] Received a recover response from a replica in STARTING status I0611 00:40:28.216256 26070 whitelist_watcher.cpp:79] No whitelist given I0611 00:40:28.216310 26066 hierarchical.hpp:309] Initialized hierarchical allocator process I0611 00:40:28.216352 26067 recover.cpp:566] Updating replica status to VOTING I0611 00:40:28.216909 26070 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 374189ns I0611 00:40:28.216931 26070 replica.cpp:323] Persisted replica status to VOTING I0611 00:40:28.217052 26075 recover.cpp:580] Successfully joined the Paxos group I0611 00:40:28.217355 26063 master.cpp:1476] The newly elected leader is master@172.17.0.116:33349 with id 20150611-004028-1946161580-33349-26042 I0611 00:40:28.217512 26063 master.cpp:1489] Elected as the leading master! I0611 00:40:28.217540 26063 master.cpp:1259] Recovering from registrar I0611 00:40:28.217753 26070 registrar.cpp:313] Recovering registrar I0611 00:40:28.217396 26075 recover.cpp:464] Recover process terminated I0611 00:40:28.218341 26065 log.cpp:661] Attempting to start the writer I0611 00:40:28.219391 26067 replica.cpp:477] Replica received implicit promise request with proposal 1 I0611 00:40:28.219696 26067
[jira] [Commented] (MESOS-1815) Create a guide to becoming a committer
[ https://issues.apache.org/jira/browse/MESOS-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596051#comment-14596051 ] Bernd Mathiske commented on MESOS-1815: --- @marco Still in progress. Next step: post the checklist in the web site. Create a guide to becoming a committer -- Key: MESOS-1815 URL: https://issues.apache.org/jira/browse/MESOS-1815 Project: Mesos Issue Type: Documentation Components: documentation Reporter: Dominic Hamon Assignee: Bernd Mathiske Labels: mesosphere We have a committer's guide, but the process by which one becomes a committer is unclear. We should set some guidelines and a process by which we can grow contributors into committers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2501) Doxygen style for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2501: --- Sprint: Mesosphere Sprint 13 Doxygen style for libprocess Key: MESOS-2501 URL: https://issues.apache.org/jira/browse/MESOS-2501 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Labels: mesosphere Original Estimate: 7m Remaining Estimate: 7m Create a description of the Doxygen style to use for libprocess documentation. It is expected that this will later also become the Doxygen style for stout and Mesos, but we are working on libprocess only for now. Possible outcome: a file named docs/doxygen-style.md We hope for much input and expect a lot of discussion! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2501) Doxygen style for libprocess
[ https://issues.apache.org/jira/browse/MESOS-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596312#comment-14596312 ] Marco Massenzio commented on MESOS-2501: The review for this was out 6 days ago and it has two Ship It's - can we please commit and resolve this story? Thanks! Doxygen style for libprocess Key: MESOS-2501 URL: https://issues.apache.org/jira/browse/MESOS-2501 Project: Mesos Issue Type: Documentation Components: libprocess Reporter: Bernd Mathiske Assignee: Joerg Schad Labels: mesosphere Original Estimate: 7m Remaining Estimate: 7m Create a description of the Doxygen style to use for libprocess documentation. It is expected that this will later also become the Doxygen style for stout and Mesos, but we are working on libprocess only for now. Possible outcome: a file named docs/doxygen-style.md We hope for much input and expect a lot of discussion! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2226: --- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12) HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Kapil Arya Labels: flaky, flaky-test, mesosphere Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with
[jira] [Updated] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2199: --- Sprint: (was: Mesosphere Sprint 12) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2512) FetcherTest.ExtractNotExecutable is flaky
[ https://issues.apache.org/jira/browse/MESOS-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2512: --- Sprint: (was: Mesosphere Sprint 12) FetcherTest.ExtractNotExecutable is flaky - Key: MESOS-2512 URL: https://issues.apache.org/jira/browse/MESOS-2512 Project: Mesos Issue Type: Bug Affects Versions: 0.23.0 Reporter: Vinod Kone Assignee: Bernd Mathiske Labels: mesosphere Observed in our internal CI. {code} [ RUN ] FetcherTest.ExtractNotExecutable Using temporary directory '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' tar: Removing leading `/' from member names I0316 18:55:48.509306 14678 fetcher.cpp:155] Starting to fetch URIs for container: de1e5165-82b4-434b-9149-8667cf652c64, directory: /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn I0316 18:55:48.509845 14678 fetcher.cpp:238] Fetching URIs using command '/var/jenkins/workspace/mesos-fedora-20-gcc/src/mesos-fetcher' I0316 18:55:48.568611 15028 logging.cpp:177] Logging to STDERR I0316 18:55:48.574928 15028 fetcher.cpp:214] Fetching URI '/tmp/DIjmjV.tar.gz' I0316 18:55:48.575166 15028 fetcher.cpp:194] Copying resource from '/tmp/DIjmjV.tar.gz' to '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' tar: This does not look like a tar archive tar: Exiting with failure status due to previous errors Failed to extract /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz:Failed to extract: command tar -C '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' -xf '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz' exited with status: 512 tests/fetcher_tests.cpp:686: Failure (fetch).failure(): Failed to fetch URIs for container 'de1e5165-82b4-434b-9149-8667cf652c64'with exit status: 256 [ FAILED ] FetcherTest.ExtractNotExecutable (208 ms) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2205) Add user documentation for reservations
[ https://issues.apache.org/jira/browse/MESOS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2205: --- Sprint: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11 (was: Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 12) Add user documentation for reservations --- Key: MESOS-2205 URL: https://issues.apache.org/jira/browse/MESOS-2205 Project: Mesos Issue Type: Documentation Components: documentation, framework Reporter: Michael Park Assignee: Michael Park Priority: Critical Labels: mesosphere Add a user guide for reservations which describes basic usage of them, how ACLs are used to specify who can unreserve whose resources, and few advanced usage cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2832) Enable configuring Mesos with environment variables without having them leak to tasks launched
[ https://issues.apache.org/jira/browse/MESOS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2832: --- Sprint: (was: Mesosphere Sprint 12) Enable configuring Mesos with environment variables without having them leak to tasks launched -- Key: MESOS-2832 URL: https://issues.apache.org/jira/browse/MESOS-2832 Project: Mesos Issue Type: Wish Reporter: Cody Maloney Assignee: Benjamin Hindman Priority: Critical Labels: mesosphere Currently if mesos is configured with environment variables (MESOS_MODULES), those show up in every task which is launched unless the executor explicitly cleans them up. If the task being launched happens to be something libprocess / mesos based, this can often prevent the task from starting up (A scheduler has issues loading a module intended for the slave). There are also cases where it would be nice to be able to change what the PATH is that tasks launch with (the host may have more in the path than tasks are supposed to / allowed to depend upon). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2784) Add constexpr to C++11 whitelist
[ https://issues.apache.org/jira/browse/MESOS-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2784: -- Sprint: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 Sprint 6 (was: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5) Add constexpr to C++11 whitelist Key: MESOS-2784 URL: https://issues.apache.org/jira/browse/MESOS-2784 Project: Mesos Issue Type: Improvement Components: documentation Reporter: Paul Brett Assignee: Paul Brett Labels: twitter constexpr is currently used to eliminate initialization dependency issues for non-POD objects. We should add it to the whitelist of acceptable c++11 features in the style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2794) Implement filesystem isolators
[ https://issues.apache.org/jira/browse/MESOS-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2794: -- Sprint: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 Sprint 6 (was: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5) Implement filesystem isolators -- Key: MESOS-2794 URL: https://issues.apache.org/jira/browse/MESOS-2794 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.1 Reporter: Ian Downes Assignee: Ian Downes Labels: twitter Move persistent volume support from Mesos containerizer to separate filesystem isolators, including support for container rootfs, where possible. Use symlinks for posix systems without container rootfs. Use bind mounts for Linux with/without container rootfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2798) Export statistics on unevictable memory
[ https://issues.apache.org/jira/browse/MESOS-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2798: -- Sprint: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 Sprint 6 (was: Twitter Q2 Sprint 3, Twitter Mesos Q2 Sprint 5) Export statistics on unevictable memory - Key: MESOS-2798 URL: https://issues.apache.org/jira/browse/MESOS-2798 Project: Mesos Issue Type: Improvement Reporter: Chi Zhang Assignee: Chi Zhang Labels: twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2853) Report per-container metrics from host egress filter
[ https://issues.apache.org/jira/browse/MESOS-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2853: -- Sprint: Twitter Mesos Q2 Sprint 5, Twitter Mesos Q2 Sprint 6 (was: Twitter Mesos Q2 Sprint 5) Report per-container metrics from host egress filter Key: MESOS-2853 URL: https://issues.apache.org/jira/browse/MESOS-2853 Project: Mesos Issue Type: Improvement Components: isolation Reporter: Paul Brett Assignee: Paul Brett Labels: twitter Export in statistics.json the fq_codel flow statistics for each container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2793) Add support for container rootfs to Mesos isolators
[ https://issues.apache.org/jira/browse/MESOS-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2793: -- Sprint: Twitter Mesos Q2 Sprint 6 Add support for container rootfs to Mesos isolators --- Key: MESOS-2793 URL: https://issues.apache.org/jira/browse/MESOS-2793 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.1 Reporter: Ian Downes Assignee: Ian Downes Labels: twitter Fix For: 0.23.0 Mesos containers can have a different rootfs to the host. Update Isolator interface to pass rootfs during Isolator::prepare(). Update Isolators where necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MESOS-2793) Add support for container rootfs to Mesos isolators
[ https://issues.apache.org/jira/browse/MESOS-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone resolved MESOS-2793. --- Resolution: Fixed Fix Version/s: 0.23.0 commit 610d4fffd0511d7ddce286ae987264cc5892f76c Author: Ian Downes idow...@twitter.com Date: Thu May 7 14:28:46 2015 -0700 Add container rootfs to Isolator::prepare(). Review: https://reviews.apache.org/r/34134 Add support for container rootfs to Mesos isolators --- Key: MESOS-2793 URL: https://issues.apache.org/jira/browse/MESOS-2793 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.1 Reporter: Ian Downes Assignee: Ian Downes Labels: twitter Fix For: 0.23.0 Mesos containers can have a different rootfs to the host. Update Isolator interface to pass rootfs during Isolator::prepare(). Update Isolators where necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-786) Update semantics of when framework registered()/reregistered() get called
[ https://issues.apache.org/jira/browse/MESOS-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-786: - Sprint: Twitter Mesos Q2 Sprint 6 Assignee: Vinod Kone Story Points: 3 Update semantics of when framework registered()/reregistered() get called - Key: MESOS-786 URL: https://issues.apache.org/jira/browse/MESOS-786 Project: Mesos Issue Type: Bug Reporter: Vinod Kone Assignee: Vinod Kone Current semantics: 1) Framework connects w/ master very first time -- registered() 2) Framework reconnects w/ same master after a zk blip -- reregistered() 3) Framework reconnects w/ failed over master -- registered() 4) Failed over framework connects w/ same master -- registered() 5) Failed over framework connects w/ failed over master -- registered() Updated semantics: Everything same except 3) Framework reconnects w/ failed over master -- reregistered() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2907) Slave : Create Basic Functionality to handle /call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596402#comment-14596402 ] Vinod Kone commented on MESOS-2907: --- s/executor/client/ ? I'm assuming this is going to be useful for both scheduler - master and slave - executor? Or is this ticket only tracking the work to add it to the slave? Slave : Create Basic Functionality to handle /call endpoint --- Key: MESOS-2907 URL: https://issues.apache.org/jira/browse/MESOS-2907 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Assignee: Anand Mazumdar Labels: HTTP, mesosphere This is the first basic step in ensuring the basic /call functionality: processing a POST /call and returning: 202 if all goes well; 401 if not authorized; and 403 if the request is malformed. Also , we might need to store some identifier which enables us to reject calls to /call if the executor has not issues a SUBSCRIBE/RESUBSCRIBE Request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2226: --- Story Points: 3 (was: 5) HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Kapil Arya Labels: flaky, flaky-test, mesosphere Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0114 18:51:34.734076 4734 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 87441ns I0114 18:51:34.734441 4734 replica.cpp:679] Persisted action at 0 I0114 18:51:34.740272 4739 replica.cpp:511] Replica received write request for position 0 I0114 18:51:34.740910 4739 leveldb.cpp:438] Reading position from leveldb took 59846ns I0114 18:51:34.741672 4739 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 189259ns I0114 18:51:34.741919 4739 replica.cpp:679] Persisted action at 0 I0114 18:51:34.743000 4739 replica.cpp:658] Replica received
[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2226: --- Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11, Mesosphere Sprint 13 (was: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Q1 Sprint 9 - 5/15, Mesosphere Sprint 10, Mesosphere Sprint 11) HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Kapil Arya Labels: flaky, flaky-test, mesosphere Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with
[jira] [Created] (MESOS-2914) Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.
Jie Yu created MESOS-2914: - Summary: Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery. Key: MESOS-2914 URL: https://issues.apache.org/jira/browse/MESOS-2914 Project: Mesos Issue Type: Bug Reporter: Jie Yu Otherwise, the icmp/arp filter on host eth0 might be removed as a result of _cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both known/unknown orphan containers. {noformat} I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 - 31607 I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d - 41020 I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 - 3302 I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 - 19805 I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters with ports [33792,34815] for container with pid 52304 I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports [33792,34816) for container with pid 52304 I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed cleanup for pid 52304 I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery complete I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 1.730304ms I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 1.685248ms I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 2.076928ms I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 2.12096ms I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 20150319-213133-2080910346-5050-57551-S3314 I0612 17:46:51.56 16318 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 1.323008ms W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling '/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 6.9341503407days in the future W0612 17:46:51.569725 16329 disk.cpp:299] Ignoring cleanup for unknown container ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.570911 16325 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.573581 16316 port_mapping.cpp:2597] Removing IP packet filters with ports [35840,36863] for container with pid 31607 I0612 17:46:51.575127 16316 port_mapping.cpp:2616] Freed ephemeral ports [35840,36864) for
[jira] [Commented] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596670#comment-14596670 ] Jie Yu commented on MESOS-2903: --- In fact, my above comments is not true since known orphans will be stored in 'infos' and unknown orphans will be cleaned up in slave recovery. See MESOS-2914 for the real cause. Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2912) Provide a Python library for master detection
[ https://issues.apache.org/jira/browse/MESOS-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2912: -- Description: When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Python library to make this easy for frameworks. was: When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Java library to make this easy for frameworks. Provide a Python library for master detection - Key: MESOS-2912 URL: https://issues.apache.org/jira/browse/MESOS-2912 Project: Mesos Issue Type: Task Reporter: Vinod Kone When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Python library to make this easy for frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2906) Slave : Synchronous Validation for Calls
[ https://issues.apache.org/jira/browse/MESOS-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2906: --- Sprint: (was: Mesosphere Sprint 13) Slave : Synchronous Validation for Calls Key: MESOS-2906 URL: https://issues.apache.org/jira/browse/MESOS-2906 Project: Mesos Issue Type: Task Reporter: Anand Mazumdar Assignee: Anand Mazumdar Labels: HTTP, mesosphere /call endpoint on the slave will return a 202 accepted code but has to do some basic validations before. In case of invalidation it will return a 4xx code. - We need to create the required infrastructure to validate the request and then process it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2295) Implement the Call endpoint on Slave
[ https://issues.apache.org/jira/browse/MESOS-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2295: --- Sprint: (was: Mesosphere Sprint 13) Implement the Call endpoint on Slave Key: MESOS-2295 URL: https://issues.apache.org/jira/browse/MESOS-2295 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2912) Provide a Python library for master detection
Vinod Kone created MESOS-2912: - Summary: Provide a Python library for master detection Key: MESOS-2912 URL: https://issues.apache.org/jira/browse/MESOS-2912 Project: Mesos Issue Type: Task Reporter: Vinod Kone When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Java library to make this easy for frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2756) Update style guide: Avoid object slicing
[ https://issues.apache.org/jira/browse/MESOS-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van Remoortere updated MESOS-2756: Sprint: (was: Mesosphere Sprint 13) Update style guide: Avoid object slicing Key: MESOS-2756 URL: https://issues.apache.org/jira/browse/MESOS-2756 Project: Mesos Issue Type: Improvement Reporter: Joris Van Remoortere Assignee: Joris Van Remoortere Labels: c++ In order to improve the safety of our code base, let's augment the style guide to: Disallow public construction of base classes so that we can avoid the object slicing problem. This is a good pattern to follow in general as it prevents subtle semantic bugs like the following: {code:title=ObjectSlicing.cpp|borderStyle=solid} #include stdio.h #include vector class Base { public: Base(int _v) : v(_v) {} virtual int get() const { return v; } protected: int v; }; class Derived : public Base { public: Derived(int _v) : Base(_v) {} virtual int get() const { return v + 1; } }; int main() { Base b(5); Derived d(5); std::vectorBase vec; vec.push_back(b); vec.push_back(d); for (const auto v : vec) { printf([%d]\n, v.get()); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-1575) master sets failover timeout to 0 when framework requests a high value
[ https://issues.apache.org/jira/browse/MESOS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio reassigned MESOS-1575: -- Assignee: (was: Timothy Chen) master sets failover timeout to 0 when framework requests a high value -- Key: MESOS-1575 URL: https://issues.apache.org/jira/browse/MESOS-1575 Project: Mesos Issue Type: Bug Reporter: Kevin Sweeney Labels: newbie, twitter In response to a registered RPC we observed the following behavior: {noformat} W0709 19:07:32.982997 11400 master.cpp:612] Using the default value for 'failover_timeout' becausethe input value is invalid: Argument out of the range that a Duration can represent due to int64_t's size limit I0709 19:07:32.983008 11404 hierarchical_allocator_process.hpp:408] Deactivated framework 20140709-184342-119646400-5050-11380-0003 I0709 19:07:32.983013 11400 master.cpp:617] Giving framework 20140709-184342-119646400-5050-11380-0003 0ns to failover I0709 19:07:32.983271 11404 master.cpp:2201] Framework failover timeout, removing framework 20140709-184342-119646400-5050-11380-0003 I0709 19:07:32.983294 11404 master.cpp:2688] Removing framework 20140709-184342-119646400-5050-11380-0003 I0709 19:07:32.983678 11404 hierarchical_allocator_process.hpp:363] Removed framework 20140709-184342-119646400-5050-11380-0003 {noformat} This was using the following frameworkInfo. {code} FrameworkInfo frameworkInfo = FrameworkInfo.newBuilder() .setUser(test) .setName(jvm) .setFailoverTimeout(Long.MAX_VALUE) .build(); {code} Instead of silently defaulting large values to 0 the master should refuse to process the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2619) Document master-scheduler communication
[ https://issues.apache.org/jira/browse/MESOS-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596693#comment-14596693 ] Connor Doyle commented on MESOS-2619: - [~vinodkone] you're right; this refers to the existing driver, not the HTTP API. For one, there's no reference to Libprocess on the documentation landing page (http://mesos.apache.org/documentation/latest/). Adding something there would be helpful, if only to seed stuck users' search terms. An overview page could describe the asynchronous-protobuf-over-HTTP design. Perhaps such a page could also be linked from an FAQ/troubleshooting topic about framework connection problems. It's great that the HTTP API will simplify some of these use cases. The existing driver will probably still be around long enough that it's worth adding docs about it. [~marco-mesos] correct, I tagged it but am not currently working the issue. We were using the tag to simply indicate interest back then. Document master-scheduler communication --- Key: MESOS-2619 URL: https://issues.apache.org/jira/browse/MESOS-2619 Project: Mesos Issue Type: Bug Components: documentation Affects Versions: 0.22.0 Reporter: Connor Doyle Labels: mesosphere New users often stumble on the networking requirements for communication between schedulers and the Mesos master. It's not explicitly stated anywhere that the master has to talk back to the scheduler. Also, some configuration options (like the LIBPROCESS_PORT environment variable) are under-documented. This problem is exacerbated as many new users start playing with Mesos and scheduers in unpredictable networking contexts (NAT, containers with bridged networking, etc.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2911) Add an Event message handler to scheduler library
[ https://issues.apache.org/jira/browse/MESOS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2911: -- Assignee: Benjamin Mahler (was: Vinod Kone) Add an Event message handler to scheduler library - Key: MESOS-2911 URL: https://issues.apache.org/jira/browse/MESOS-2911 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Benjamin Mahler Adding this handler lets master send Event messages to the library. See MESOS-2909 for additional context. This ticket only tracks the installation of the handler and maybe handling of a single event for testing. Additional events handling will be captured in a different ticket(s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2551) C++ Scheduler library should send Call messages to Master
[ https://issues.apache.org/jira/browse/MESOS-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2551: --- Sprint: Mesosphere Sprint 13 C++ Scheduler library should send Call messages to Master - Key: MESOS-2551 URL: https://issues.apache.org/jira/browse/MESOS-2551 Project: Mesos Issue Type: Story Reporter: Vinod Kone Assignee: Isabel Jimenez Labels: mesosphere Currently, the C++ library sends different messages to Master instead of a single Call message. To vet the new Call API it should send Call messages. Master should be updated to handle all types of Calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2298) Provide a Java library for master detection
[ https://issues.apache.org/jira/browse/MESOS-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-2298: -- Description: When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Java library to make this easy for frameworks. was:When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Ideally, Mesos provides master detection library/libraries in supported languages (java and python to start with) to make this easy for frameworks. Story Points: 5 Summary: Provide a Java library for master detection (was: Provide master detection library/libraries for pure schedulers) Provide a Java library for master detection --- Key: MESOS-2298 URL: https://issues.apache.org/jira/browse/MESOS-2298 Project: Mesos Issue Type: Task Reporter: Vinod Kone When schedulers start interacting with Mesos master via HTTP endpoints, they need a way to detect masters. Mesos should provide a master detection Java library to make this easy for frameworks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-2296: -- Summary: Implement the Events stream on slave for Call endpoint (was: Implement the Events endpoint on slave) Implement the Events stream on slave for Call endpoint -- Key: MESOS-2296 URL: https://issues.apache.org/jira/browse/MESOS-2296 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2913) Scheduler library should send Call messages to the master
Vinod Kone created MESOS-2913: - Summary: Scheduler library should send Call messages to the master Key: MESOS-2913 URL: https://issues.apache.org/jira/browse/MESOS-2913 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Vinod Kone To vet the new Call protobufs, it is prudent to have the scheduler driver (sched.cpp) send Call messages to the master (similar to what we are doing with the scheduler library). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2337) __init__.py not getting installed in $PREFIX/lib/pythonX.Y/site-packages/mesos
[ https://issues.apache.org/jira/browse/MESOS-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2337: --- Sprint: Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1, Mesosphere Sprint 13 (was: Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3, Mesosphere Q1 Sprint 7 - 4/17, Mesosphere Q2 Sprint 8 - 5/1) __init__.py not getting installed in $PREFIX/lib/pythonX.Y/site-packages/mesos -- Key: MESOS-2337 URL: https://issues.apache.org/jira/browse/MESOS-2337 Project: Mesos Issue Type: Bug Components: build, python api Reporter: Kapil Arya Assignee: Marco Massenzio Priority: Critical Labels: mesosphere When doing a {{make install}}, the src/python/native/src/mesos/__init__.py file is not getting installed in {{$PREFIX/lib/pythonX.Y/site-packages/mesos/}}. This makes it impossible to do the following import when {{PYTHONPATH}} is set to the {{site-packages}} directory. {code} import mesos.interface.mesos_pb2 {code} The directories {{$PREFIX/lib/pythonX.Y/site-packages/mesos/interface, native}} do have their corresponding {{__init__.py}} files. Reproducing the bug: {code} ../configure --prefix=$HOME/test-install make install {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2914) Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.
[ https://issues.apache.org/jira/browse/MESOS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-2914: - Assignee: Jie Yu Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery. --- Key: MESOS-2914 URL: https://issues.apache.org/jira/browse/MESOS-2914 Project: Mesos Issue Type: Bug Reporter: Jie Yu Assignee: Jie Yu Otherwise, the icmp/arp filter on host eth0 might be removed as a result of _cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both known/unknown orphan containers. {noformat} I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 - 31607 I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d - 41020 I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 - 3302 I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 - 19805 I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters with ports [33792,34815] for container with pid 52304 I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports [33792,34816) for container with pid 52304 I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed cleanup for pid 52304 I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery complete I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 1.730304ms I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 1.685248ms I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 2.076928ms I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 2.12096ms I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 20150319-213133-2080910346-5050-57551-S3314 I0612 17:46:51.56 16318 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 1.323008ms W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling '/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 6.9341503407days in the future W0612 17:46:51.569725 16329 disk.cpp:299] Ignoring cleanup for unknown container
[jira] [Resolved] (MESOS-2903) Network isolator should not fail when target state already exists
[ https://issues.apache.org/jira/browse/MESOS-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu resolved MESOS-2903. --- Resolution: Invalid Network isolator should not fail when target state already exists - Key: MESOS-2903 URL: https://issues.apache.org/jira/browse/MESOS-2903 Project: Mesos Issue Type: Bug Components: isolation Affects Versions: 0.23.0 Reporter: Paul Brett Assignee: Paul Brett Priority: Critical Network isolator has multiple instances of the following pattern: {noformat} Trybool something = ::create(); if (something.isError()) { ++metrics.something_errors; return Failure(Failed to create something ...) } else if (!icmpVethToEth0.get()) { ++metrics.adding_veth_icmp_filters_already_exist; return Failure(Something already exists); } {noformat} These failures have occurred in operation due to the failure to recover or delete an orphan, causing the slave to remain on line but unable to create new resources.We should convert the second failure message in this pattern to an information message since the final state of the system is the state that we requested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2904) Add slave metric to count container launch failures
[ https://issues.apache.org/jira/browse/MESOS-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596747#comment-14596747 ] Paul Brett commented on MESOS-2904: --- Fix without test hardness is out for review https://reviews.apache.org/r/35738/ Add slave metric to count container launch failures --- Key: MESOS-2904 URL: https://issues.apache.org/jira/browse/MESOS-2904 Project: Mesos Issue Type: Bug Components: slave, statistics Reporter: Paul Brett Assignee: Paul Brett We have seen circumstances where a machine has been consistently unable to launch containers due to an inconsistent state (for example, unexpected network configuration). Adding a metric to track container launch failures will allow us to detect and alert on slaves in such a state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596851#comment-14596851 ] Vinod Kone commented on MESOS-1807: --- There seems to be some confusion, so let me clairfy. Mesos currently allows executors that use either 0 cpus or 0 memory. Since 0.21.0, it emits a warning, but continues to allow them. Note that this ticket is not resolved yet. The goal of this ticket is to disallow executors with 0 cpus or 0 memory. Any issues that marathon or chronos are seeing is the long standing behavior of Mesos and precisely why this ticket was created. This deprecation was announced in the CHANGELOG for 0.21.0. Not sure if there was a specific email sent to the dev list though. Disallow executors with cpu only or memory only resources - Key: MESOS-1807 URL: https://issues.apache.org/jira/browse/MESOS-1807 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Labels: newbie Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2862) mesos-fetcher won't fetch uris which begin with a
[ https://issues.apache.org/jira/browse/MESOS-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596970#comment-14596970 ] Artem Harutyunyan commented on MESOS-2862: -- https://reviews.apache.org/r/35757/ https://reviews.apache.org/r/35755/ mesos-fetcher won't fetch uris which begin with a - Key: MESOS-2862 URL: https://issues.apache.org/jira/browse/MESOS-2862 Project: Mesos Issue Type: Bug Components: fetcher Affects Versions: 0.22.1 Reporter: Cody Maloney Assignee: Artem Harutyunyan Priority: Minor Labels: mesosphere, newbie Discovered while running mesos with marathon on top. If I launch a marathon task with a URI which is http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz; mesos will log to stderr: {code} I0611 22:39:22.815636 35673 logging.cpp:177] Logging to STDERR I0611 22:39:25.643889 35673 fetcher.cpp:214] Fetching URI ' http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz' I0611 22:39:25.648111 35673 fetcher.cpp:94] Hadoop Client not available, skipping fetch with Hadoop Client Failed to fetch: http://apache.osuosl.org/mesos/0.22.1/mesos-0.22.1.tar.gz Failed to synchronize with slave (it's probably exited) {code} It would be nice if mesos trimmed leading whitespace before doing protocol detection so that simple mistakes are just fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2473) Failure to recover because of freezer timeout should not suggest removing meta data
[ https://issues.apache.org/jira/browse/MESOS-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2473: --- Target Version/s: 0.24.0 (was: 0.23.0) Failure to recover because of freezer timeout should not suggest removing meta data --- Key: MESOS-2473 URL: https://issues.apache.org/jira/browse/MESOS-2473 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.0 Reporter: Ian Downes Labels: twitter A more appropriate action should be suggested, e.g., manually kill the processes in cgroup xxx because the slave will still attempt to clean up orphans and hit the same code path. {noformat} I0310 23:04:23.961019 32342 slave.cpp:3321] Current usage 35.87%. Max allowed age: 3.789365411204225days Failed to perform recovery: Collect failed: Timed out after 1mins To remedy this do as follows: Step 1: rm -f /var/lib/mesos/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave. Slave Exit Status: 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2884) Allow isolators to specify required namespaces
[ https://issues.apache.org/jira/browse/MESOS-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2884: -- Target Version/s: 0.23.0 Allow isolators to specify required namespaces -- Key: MESOS-2884 URL: https://issues.apache.org/jira/browse/MESOS-2884 Project: Mesos Issue Type: Task Components: isolation Reporter: Kapil Arya Assignee: Kapil Arya Labels: mesosphere Currently, the LinuxLauncher looks into SlaveFlags to compute the namespaces that should be enabled when launching the executor. This means that a custom Isolator module doesn't have any way to specify dependency on a set of namespaces. The proposed solution is to extend the Isolator interface to also export the namespaces dependency. This way the MesosContainerizer can directly query all loaded Isolators (inbuilt and custom modules) to compute the set of namespaces required by the executor. This set of namespaces is then passed on to the LinuxLauncher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-2226: -- Assignee: Niklas Quarfot Nielsen (was: Kapil Arya) HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Niklas Quarfot Nielsen Labels: flaky, flaky-test, mesosphere Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with proposal 2 I0114 18:51:34.734076 4734 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 87441ns I0114 18:51:34.734441 4734 replica.cpp:679] Persisted action at 0 I0114 18:51:34.740272 4739 replica.cpp:511] Replica received write request for position 0 I0114 18:51:34.740910 4739 leveldb.cpp:438] Reading position from leveldb took 59846ns I0114 18:51:34.741672 4739 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 189259ns I0114 18:51:34.741919 4739 replica.cpp:679] Persisted action at 0 I0114 18:51:34.743000 4739
[jira] [Comment Edited] (MESOS-2618) Update C++ style guide on function definition / invocation formatting.
[ https://issues.apache.org/jira/browse/MESOS-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596770#comment-14596770 ] Michael Park edited comment on MESOS-2618 at 6/22/15 10:28 PM: --- I recently did a little formatting cleanup and received some feedback that relates to this ticket. In [r35635|https://reviews.apache.org/r/35635], it was noted that we prefer: {code} (1) delay(flags.executor_shutdown_grace_period, self(), Slave::shutdownExecutorTimeout, framework-id(), executor-id, executor-containerId); {code} over {code} (2) delay( flags.executor_shutdown_grace_period, self(), Slave::shutdownExecutorTimeout, framework-id(), executor-id, executor-containerId); {code} We also prefer: {code} (3) containerizer-wait(containerId) .onAny(defer(self(), Self::executorTerminated, frameworkId, executorId, lambda::_1)); {code} over {code} (4) containerizer-wait(containerId) .onAny(defer( self(), Self::executorTerminated, frameworkId, executorId, lambda::_1)); {code} Both of the preferred styles above are what {{clang-format}} produces. I think this goes to show that what {{clang-format}} produces is good in most cases (Of course there are short-comings since it's a developing project, but I think the spectacularly bad ones are the ones we should be talking about. Rather than cases like this where either style is just as readable). For example, reasoning through and outlining the exact rules as to why we prefer (3) over (4) is non-trivial nor all that helpful. {{clang-format}} uses penalties for various undesired formatting styles and chooses a style which minimizes the total penalty (as per LaTeX). I think relying on that system would be more systematic and beneficial in terms of time-saving for all of us. was (Author: mcypark): I recently did a little formatting cleanup and received some feedback that relates to this ticket. In [r35635|https://reviews.apache.org/r/35635], it was noted that we prefer: {code} (1) delay(flags.executor_shutdown_grace_period, self(), Slave::shutdownExecutorTimeout, framework-id(), executor-id, executor-containerId); {code} over {code} (2) delay( flags.executor_shutdown_grace_period, self(), Slave::shutdownExecutorTimeout, framework-id(), executor-id, executor-containerId); {code} We also prefer: {code} (3) containerizer-wait(containerId) .onAny(defer(self(), Self::executorTerminated, frameworkId, executorId, lambda::_1)); {code} over {code} (4) containerizer-wait(containerId) .onAny(defer( self(), Self::executorTerminated, frameworkId, executorId, lambda::_1)); {code} Both of the preferred styles above are what {{clang-format}} produces. I think this goes to show that what {{clang-format}} produces is good in most cases (Of course there are short-comings since it's a developing project, but I think the spectacularly bad ones are the ones we should be talking about. Rather than cases like this where either way it's just as readable). For example, reasoning through and outlining the exact rules as to why we prefer (3) over (4) is non-trivial nor all that helpful. {{clang-format}} uses penalties for various undesired formatting styles and chooses a style which minimizes the total penalty (as per LaTeX). I think relying on that system would be more systematic and beneficial in terms of time-saving for all of us. Update C++ style guide on function definition / invocation formatting. --- Key: MESOS-2618 URL: https://issues.apache.org/jira/browse/MESOS-2618 Project: Mesos Issue Type: Documentation Reporter: Till Toenshoff Priority: Minor Our style guide currently suggests two options for cases of function definitions / invocations that do not fit into a single line even when breaking after the opening argument bracket; Fixed leading indention (4 spaces); {noformat} // 4: OK. allocator-resourcesRecovered( frameworkId, slaveId, resources, filters); {noformat} Variable leading indention; {noformat} // 3: In this case, 3 is OK. foobar(someArgument, someOtherArgument, theLastArgument); {noformat} There is a counter-case mentioned as for the latter; {noformat} // 3: Don't use in this case due to jaggedness. allocator-resourcesRecovered(frameworkId, slaveId,
[jira] [Commented] (MESOS-2618) Update C++ style guide on function definition / invocation formatting.
[ https://issues.apache.org/jira/browse/MESOS-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596770#comment-14596770 ] Michael Park commented on MESOS-2618: - I recently did a little formatting cleanup and received some feedback that relates to this ticket. In [r35635|https://reviews.apache.org/r/35635], it was noted that we prefer: {code} (1) delay(flags.executor_shutdown_grace_period, self(), Slave::shutdownExecutorTimeout, framework-id(), executor-id, executor-containerId); {code} over {code} (2) delay( flags.executor_shutdown_grace_period, self(), Slave::shutdownExecutorTimeout, framework-id(), executor-id, executor-containerId); {code} We also prefer: {code} (3) containerizer-wait(containerId) .onAny(defer(self(), Self::executorTerminated, frameworkId, executorId, lambda::_1)); {code} over {code} (4) containerizer-wait(containerId) .onAny(defer( self(), Self::executorTerminated, frameworkId, executorId, lambda::_1)); {code} Both of the preferred styles above are what {{clang-format}} produces. I think this goes to show that what {{clang-format}} produces is good in most cases (Of course there are short-comings since it's a developing project, but I think the spectacularly bad ones are the ones we should be talking about. Rather than cases like this where either way it's just as readable). For example, reasoning through and outlining the exact rules as to why we prefer (3) over (4) is non-trivial nor all that helpful. {{clang-format}} uses penalties for various undesired formatting styles and chooses a style which minimizes the total penalty (as per LaTeX). I think relying on that system would be more systematic and beneficial in terms of time-saving for all of us. Update C++ style guide on function definition / invocation formatting. --- Key: MESOS-2618 URL: https://issues.apache.org/jira/browse/MESOS-2618 Project: Mesos Issue Type: Documentation Reporter: Till Toenshoff Priority: Minor Our style guide currently suggests two options for cases of function definitions / invocations that do not fit into a single line even when breaking after the opening argument bracket; Fixed leading indention (4 spaces); {noformat} // 4: OK. allocator-resourcesRecovered( frameworkId, slaveId, resources, filters); {noformat} Variable leading indention; {noformat} // 3: In this case, 3 is OK. foobar(someArgument, someOtherArgument, theLastArgument); {noformat} There is a counter-case mentioned as for the latter; {noformat} // 3: Don't use in this case due to jaggedness. allocator-resourcesRecovered(frameworkId, slaveId, resources, filters); {noformat} The problem here seems to be that the counter-case might not be well defined on when it applies. We might want to consider... A. removing the variable leading option entirely B. define the exact limits on when jaggedness applies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2294) Implement the Events stream on master for Call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar reassigned MESOS-2294: - Assignee: Anand Mazumdar Implement the Events stream on master for Call endpoint --- Key: MESOS-2294 URL: https://issues.apache.org/jira/browse/MESOS-2294 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: twitter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1856) Support specifying libnl3 install location.
[ https://issues.apache.org/jira/browse/MESOS-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596913#comment-14596913 ] Marco Massenzio commented on MESOS-1856: Hey [~rji], it looks we're running out of time for {{0.23}} to fix this: would you mind terribly to write up the workaround as you suggest, so we can make it part of the release notes / developer docs? Thanks! Support specifying libnl3 install location. --- Key: MESOS-1856 URL: https://issues.apache.org/jira/browse/MESOS-1856 Project: Mesos Issue Type: Task Affects Versions: 0.22.0, 0.22.1 Reporter: Jie Yu LIBNL_CFLAGS uses a hard-coded path in the configure script, instead of detecting the location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2915) Expose State API via new HTTP API
Tomás Senart created MESOS-2915: --- Summary: Expose State API via new HTTP API Key: MESOS-2915 URL: https://issues.apache.org/jira/browse/MESOS-2915 Project: Mesos Issue Type: Story Reporter: Tomás Senart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2914) Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery.
[ https://issues.apache.org/jira/browse/MESOS-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596823#comment-14596823 ] Jie Yu commented on MESOS-2914: --- https://reviews.apache.org/r/35749/ https://reviews.apache.org/r/35750/ Port mapping isolator should cleanup unknown orphan containers after all known orphan containers are recovered during recovery. --- Key: MESOS-2914 URL: https://issues.apache.org/jira/browse/MESOS-2914 Project: Mesos Issue Type: Bug Reporter: Jie Yu Assignee: Jie Yu Otherwise, the icmp/arp filter on host eth0 might be removed as a result of _cleanup if 'infos' is empty, causing subsequent '_cleanup' to fail on both known/unknown orphan containers. {noformat} I0612 17:46:51.518501 16308 containerizer.cpp:314] Recovering containerizer I0612 17:46:51.520612 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink ddcb8397-3552-44f9-bc99-b5b69aa72944 - 31607 I0612 17:46:51.521183 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink d8c48a4a-fdfb-47dd-b8d8-07188c21600d - 41020 I0612 17:46:51.521883 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 8953fc7f-9fca-4931-b0cb-2f4959ddee74 - 3302 I0612 17:46:51.522542 16308 port_mapping.cpp:1567] Discovered network namespace handle symlink 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 - 19805 I0612 17:46:51.523643 16308 port_mapping.cpp:2597] Removing IP packet filters with ports [33792,34815] for container with pid 52304 I0612 17:46:51.525063 16308 port_mapping.cpp:2616] Freed ephemeral ports [33792,34816) for container with pid 52304 I0612 17:46:51.547696 16308 port_mapping.cpp:2762] Successfully performed cleanup for pid 52304 I0612 17:46:51.550027 16308 port_mapping.cpp:1698] Network isolator recovery complete I0612 17:46:51.550946 16329 containerizer.cpp:449] Removing orphan container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.552686 16329 containerizer.cpp:449] Removing orphan container ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.552734 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.554932 16329 containerizer.cpp:449] Removing orphan container 8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.555032 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.555629 16308 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 1.730304ms I0612 17:46:51.557507 16329 containerizer.cpp:449] Removing orphan container 50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.557611 16309 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8953fc7f-9fca-4931-b0cb-2f4959ddee74 I0612 17:46:51.557896 16313 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 1.685248ms I0612 17:46:51.559412 16310 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.561564 16329 containerizer.cpp:449] Removing orphan container d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.562489 16315 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/50f9986f-ebbc-440d-86a7-9fa1a7c55a75 I0612 17:46:51.562988 16313 cgroups.cpp:2394] Thawing cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 I0612 17:46:51.563303 16310 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/111ea69c-6184-4da1-a0e9-c34e8c6deb30 after 2.076928ms I0612 17:46:51.566052 16308 cgroups.cpp:2377] Freezing cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d I0612 17:46:51.566102 16313 slave.cpp:3911] Finished recovery W0612 17:46:51.566432 16323 disk.cpp:299] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.566651 16317 cgroups.cpp:1449] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/ddcb8397-3552-44f9-bc99-b5b69aa72944 after 2.12096ms I0612 17:46:51.566987 16313 slave.cpp:3944] Garbage collecting old slave 20150319-213133-2080910346-5050-57551-S3314 I0612 17:46:51.56 16318 cgroups.cpp:1420] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/d8c48a4a-fdfb-47dd-b8d8-07188c21600d after 1.323008ms W0612 17:46:51.568042 16323 port_mapping.cpp:2544] Ignoring cleanup for unknown container 111ea69c-6184-4da1-a0e9-c34e8c6deb30 I0612 17:46:51.569522 16311 gc.cpp:56] Scheduling '/var/lib/mesos/slaves/20150319-213133-2080910346-5050-57551-S3314' for gc 6.9341503407days in the future W0612
[jira] [Commented] (MESOS-2726) Add support for enabling network namespace without enabling the network isolator
[ https://issues.apache.org/jira/browse/MESOS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596908#comment-14596908 ] Marco Massenzio commented on MESOS-2726: Please take a look and update ticket. Add support for enabling network namespace without enabling the network isolator Key: MESOS-2726 URL: https://issues.apache.org/jira/browse/MESOS-2726 Project: Mesos Issue Type: Task Components: isolation Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Following the discussion Kapil started, it is currently not possible to enable the linux network namespace for a container without enabling the network isolator (which requires certain kernel capabilities and dependencies). Following the pattern of enabling pid namespaces (--isolation=namespaces/pid). One possible solution could be to add another one for network i.e. namespaces/network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2726) Add support for enabling network namespace without enabling the network isolator
[ https://issues.apache.org/jira/browse/MESOS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2726: --- Assignee: Kapil Arya Add support for enabling network namespace without enabling the network isolator Key: MESOS-2726 URL: https://issues.apache.org/jira/browse/MESOS-2726 Project: Mesos Issue Type: Task Components: isolation Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Following the discussion Kapil started, it is currently not possible to enable the linux network namespace for a container without enabling the network isolator (which requires certain kernel capabilities and dependencies). Following the pattern of enabling pid namespaces (--isolation=namespaces/pid). One possible solution could be to add another one for network i.e. namespaces/network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2473) Failure to recover because of freezer timeout should not suggest removing meta data
[ https://issues.apache.org/jira/browse/MESOS-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596906#comment-14596906 ] Marco Massenzio commented on MESOS-2473: Unfortunately, it appears we won't get to do this in time for 0.23 - nothing happened for a couple of months and it's not critical for release. Failure to recover because of freezer timeout should not suggest removing meta data --- Key: MESOS-2473 URL: https://issues.apache.org/jira/browse/MESOS-2473 Project: Mesos Issue Type: Improvement Components: isolation Affects Versions: 0.22.0 Reporter: Ian Downes Labels: twitter A more appropriate action should be suggested, e.g., manually kill the processes in cgroup xxx because the slave will still attempt to clean up orphans and hit the same code path. {noformat} I0310 23:04:23.961019 32342 slave.cpp:3321] Current usage 35.87%. Max allowed age: 3.789365411204225days Failed to perform recovery: Collect failed: Timed out after 1mins To remedy this do as follows: Step 1: rm -f /var/lib/mesos/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave. Slave Exit Status: 1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky
[ https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596949#comment-14596949 ] Kapil Arya commented on MESOS-2226: --- Created a RR to handle the issue: https://reviews.apache.org/r/35756/ The current understanding is that the failure was due the race introduced by the code not checking for TASK_RUNNING status update message from the MockExecutor before stopping the scheduler driver. This caused the Executor to be terminated prematurely (before the tasks were launched) and thus the remove-executor hook was never called. The fix was to wait for the TASK_RUNNING status update and then wait for the shutdown() within MockExecutor. Only then we wait for the future from remove-executor hook. HookTest.VerifySlaveLaunchExecutorHook is flaky --- Key: MESOS-2226 URL: https://issues.apache.org/jira/browse/MESOS-2226 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Vinod Kone Assignee: Kapil Arya Labels: flaky, flaky-test, mesosphere Observed this on internal CI {code} [ RUN ] HookTest.VerifySlaveLaunchExecutorHook Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME' I0114 18:51:34.659353 4720 leveldb.cpp:176] Opened db in 1.255951ms I0114 18:51:34.662112 4720 leveldb.cpp:183] Compacted db in 596090ns I0114 18:51:34.662364 4720 leveldb.cpp:198] Created db iterator in 177877ns I0114 18:51:34.662719 4720 leveldb.cpp:204] Seeked to beginning of db in 19709ns I0114 18:51:34.663010 4720 leveldb.cpp:273] Iterated through 0 keys in the db in 18208ns I0114 18:51:34.663312 4720 replica.cpp:744] Replica recovered with log positions 0 - 0 with 1 holes and 0 unlearned I0114 18:51:34.664266 4735 recover.cpp:449] Starting replica recovery I0114 18:51:34.664908 4735 recover.cpp:475] Replica is in EMPTY status I0114 18:51:34.667842 4734 replica.cpp:641] Replica in EMPTY status received a broadcasted recover request I0114 18:51:34.669117 4735 recover.cpp:195] Received a recover response from a replica in EMPTY status I0114 18:51:34.677913 4735 recover.cpp:566] Updating replica status to STARTING I0114 18:51:34.683157 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 137939ns I0114 18:51:34.683507 4735 replica.cpp:323] Persisted replica status to STARTING I0114 18:51:34.684013 4735 recover.cpp:475] Replica is in STARTING status I0114 18:51:34.685554 4738 replica.cpp:641] Replica in STARTING status received a broadcasted recover request I0114 18:51:34.696512 4736 recover.cpp:195] Received a recover response from a replica in STARTING status I0114 18:51:34.700552 4735 recover.cpp:566] Updating replica status to VOTING I0114 18:51:34.701128 4735 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 115624ns I0114 18:51:34.701478 4735 replica.cpp:323] Persisted replica status to VOTING I0114 18:51:34.701817 4735 recover.cpp:580] Successfully joined the Paxos group I0114 18:51:34.702569 4735 recover.cpp:464] Recover process terminated I0114 18:51:34.716439 4736 master.cpp:262] Master 20150114-185134-2272962752-57018-4720 (fedora-19) started on 192.168.122.135:57018 I0114 18:51:34.716913 4736 master.cpp:308] Master only allowing authenticated frameworks to register I0114 18:51:34.717136 4736 master.cpp:313] Master only allowing authenticated slaves to register I0114 18:51:34.717488 4736 credentials.hpp:36] Loading credentials for authentication from '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials' I0114 18:51:34.718077 4736 master.cpp:357] Authorization enabled I0114 18:51:34.719238 4738 whitelist_watcher.cpp:65] No whitelist given I0114 18:51:34.719755 4737 hierarchical_allocator_process.hpp:285] Initialized hierarchical allocator process I0114 18:51:34.722584 4736 master.cpp:1219] The newly elected leader is master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720 I0114 18:51:34.722865 4736 master.cpp:1232] Elected as the leading master! I0114 18:51:34.723310 4736 master.cpp:1050] Recovering from registrar I0114 18:51:34.723760 4734 registrar.cpp:313] Recovering registrar I0114 18:51:34.725229 4740 log.cpp:660] Attempting to start the writer I0114 18:51:34.727893 4739 replica.cpp:477] Replica received implicit promise request with proposal 1 I0114 18:51:34.728425 4739 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 114781ns I0114 18:51:34.728662 4739 replica.cpp:345] Persisted promised to 1 I0114 18:51:34.731271 4741 coordinator.cpp:230] Coordinator attemping to fill missing position I0114 18:51:34.733223 4734 replica.cpp:378] Replica received explicit promise request for position 0 with
[jira] [Commented] (MESOS-2909) Add version field to RegisterFrameworkMessage and ReregisterFrameworkMessage
[ https://issues.apache.org/jira/browse/MESOS-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596891#comment-14596891 ] Benjamin Mahler commented on MESOS-2909: I have a patch for this, but will hold off until we have the handlers fully implemented on the driver / library side (linked in blocking tickets, but more will follow). Adding it after the handlers are functional allows the master to assume that a present version means events can be sent. Add version field to RegisterFrameworkMessage and ReregisterFrameworkMessage Key: MESOS-2909 URL: https://issues.apache.org/jira/browse/MESOS-2909 Project: Mesos Issue Type: Task Reporter: Vinod Kone Assignee: Benjamin Mahler In the same way we added 'version' field to RegisterSlaveMessage and ReregisterSlaveMessage, we should do it framework (re-)registration messages. This would help master determine which version of scheduler driver it is talking to. We want this so that master can start sending Event messages to the scheduler driver (and scheduler library). In the long term, master will send a streaming response to the libraries, but in the meantime we can test the event protobufs by sending Event messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1988) Scheduler driver should not generate TASK_LOST when disconnected from master
[ https://issues.apache.org/jira/browse/MESOS-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-1988: -- Fix Version/s: 0.24.0 Scheduler driver should not generate TASK_LOST when disconnected from master Key: MESOS-1988 URL: https://issues.apache.org/jira/browse/MESOS-1988 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Assignee: Anand Mazumdar Labels: mesosphere, twitter Fix For: 0.24.0 Currently, the driver replies to launchTasks() with TASK_LOST if it detects that it is disconnected from the master. After MESOS-1972 lands, this will be the only place where driver generates TASK_LOST. See MESOS-1972 for more context. This fix is targeted for 0.22.0 to give frameworks time to implement reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1571) Signal escalation timeout is not configurable
[ https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-1571: --- Target Version/s: (was: 0.23.0) Signal escalation timeout is not configurable - Key: MESOS-1571 URL: https://issues.apache.org/jira/browse/MESOS-1571 Project: Mesos Issue Type: Bug Reporter: Niklas Quarfot Nielsen Labels: mesosphere Even though the executor shutdown grace period is set to a larger interval, the signal escalation timeout will still be 3 seconds. It should either be configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2725) undocumented how to enable pid namespace isolation and shared filesystem isolation
[ https://issues.apache.org/jira/browse/MESOS-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Massenzio updated MESOS-2725: --- Target Version/s: (was: 0.23.0) undocumented how to enable pid namespace isolation and shared filesystem isolation -- Key: MESOS-2725 URL: https://issues.apache.org/jira/browse/MESOS-2725 Project: Mesos Issue Type: Documentation Components: containerization, documentation Reporter: Adam Tulinius http://mesos.apache.org/documentation/latest/mesos-containerizer/ doesn't actually mention how to enable shared filesystem- and pid namespace isolation. I'll suggest adding something like: To enable the Shared Filesystem isolator, append filesystem/shared to the --isolation flag when starting the slave. .. and: To enable the Pid Namespace isolator, append namespaces/pid to the --isolation flag when starting the slave. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1807) Disallow executors with cpu only or memory only resources
[ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596807#comment-14596807 ] Elizabeth Lingg commented on MESOS-1807: For me, the issue is that Chronos and Marathon, for example, currently launch custom executors with 0 cpu AND 0 memory. While this needs to be fixed in Chronos and Marathon, a full announcement, warning, deprecation cycle would be appropriate in my view. I do agree that custom executors need to specify both CPU and Memory. Disallow executors with cpu only or memory only resources - Key: MESOS-1807 URL: https://issues.apache.org/jira/browse/MESOS-1807 Project: Mesos Issue Type: Improvement Reporter: Vinod Kone Labels: newbie Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that. This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should do the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2296) Implement the Events stream on slave for Call endpoint
[ https://issues.apache.org/jira/browse/MESOS-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-2296: -- Assignee: (was: Anand Mazumdar) Implement the Events stream on slave for Call endpoint -- Key: MESOS-2296 URL: https://issues.apache.org/jira/browse/MESOS-2296 Project: Mesos Issue Type: Task Reporter: Vinod Kone Labels: mesosphere -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (MESOS-2726) Add support for enabling network namespace without enabling the network isolator
[ https://issues.apache.org/jira/browse/MESOS-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya closed MESOS-2726. - Resolution: Duplicate Add support for enabling network namespace without enabling the network isolator Key: MESOS-2726 URL: https://issues.apache.org/jira/browse/MESOS-2726 Project: Mesos Issue Type: Task Components: isolation Reporter: Niklas Quarfot Nielsen Assignee: Kapil Arya Following the discussion Kapil started, it is currently not possible to enable the linux network namespace for a container without enabling the network isolator (which requires certain kernel capabilities and dependencies). Following the pattern of enabling pid namespaces (--isolation=namespaces/pid). One possible solution could be to add another one for network i.e. namespaces/network. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-2199: --- Assignee: haosdent Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595801#comment-14595801 ] haosdent commented on MESOS-2199: - The patch: https://reviews.apache.org/r/35728/diff Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595489#comment-14595489 ] haosdent commented on MESOS-2199: - Thank you for your explain, let me try to add it. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2637) Consolidate 'foo', 'bar', ... string constants in test and example code
[ https://issues.apache.org/jira/browse/MESOS-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Williams updated MESOS-2637: -- Description: We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. These values should be stored in local variables to avoid the possibility of assignment getting out of sync with checking for that same value. (was: We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. We should consolidate them to make the call sites less prone to forgetting to update all call sites.) Consolidate 'foo', 'bar', ... string constants in test and example code --- Key: MESOS-2637 URL: https://issues.apache.org/jira/browse/MESOS-2637 Project: Mesos Issue Type: Bug Components: technical debt Reporter: Niklas Quarfot Nielsen Assignee: Colin Williams We are using 'foo', 'bar', ... string constants and pairs in src/tests/master_tests.cpp, src/tests/slave_tests.cpp, src/tests/hook_tests.cpp and src/examples/test_hook_module.cpp for label and hooks tests. These values should be stored in local variables to avoid the possibility of assignment getting out of sync with checking for that same value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595487#comment-14595487 ] Adam B commented on MESOS-2199: --- Nice detective work. We cannot require special ordering of tests. Each unit test should work in isolation, and they should all pass even with --gtest_shuffle enabled. We may indeed need some pre/post test steps in SetUp/TearDown methods. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595472#comment-14595472 ] haosdent commented on MESOS-2199: - But when you run SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser first, and then run SlaveTest.ROOT_RunTaskWithCommandInfoWithUser. It could pass because {code}build/src/.libs/lt-mesos-executor{code} have already create in SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-2199: Comment: was deleted (was: I replace nobody to dbus, the test case could pass. But when I use nobody, it failed. ) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595473#comment-14595473 ] haosdent commented on MESOS-2199: - So do we need some prepare steps in SlaveTest.ROOT_RunTaskWithCommandInfoWithUser? Or just keep current status? For jenkins, I think it could pass because SlaveTest.ROOT_RunTaskWithCommandInfoWithoutUser is running before SlaveTest.ROOT_RunTaskWithCommandInfoWithUser. [~adam-mesos] [~idownes] [~nnielsen] Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2199) Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser
[ https://issues.apache.org/jira/browse/MESOS-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595471#comment-14595471 ] haosdent commented on MESOS-2199: - When run this test case, in build/src/mesos-executor, it would check the build/src/.libs/lt-mesos-executor exist or not. {code} program=lt-'mesos-executor' progdir=$thisdir/.libs if test ! -f $progdir/$program || { file=`ls -1dt $progdir/$program $progdir/../$program 2/dev/null | /bin/sed 1q`; \ test X$file != X$progdir/$program; }; then {code} When lt-mesos-executor is not exist, it would try to compile it. {code} relink_command=(... {code} But mesos-executor is run under no-root user, so when compile it, it would failed. Finally it could reproduce the problem like that. Failing test: SlaveTest.ROOT_RunTaskWithCommandInfoWithUser --- Key: MESOS-2199 URL: https://issues.apache.org/jira/browse/MESOS-2199 Project: Mesos Issue Type: Bug Components: test Reporter: Ian Downes Assignee: haosdent Labels: mesosphere Appears that running the executor as {{nobody}} is not supported. [~nnielsen] can you take a look? Executor log: {noformat} [root@hostname build]# cat /tmp/SlaveTest_ROOT_RunTaskWithCommandInfoWithUser_cxF1dY/slaves/20141219-005206-2081170186-60487-11862-S0/frameworks/20141219-005206-2081170186-60 487-11862-/executors/1/runs/latest/std* sh: /home/idownes/workspace/mesos/build/src/mesos-executor: Permission denied {noformat} Test output: {noformat} [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from SlaveTest [ RUN ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser ../../src/tests/slave_tests.cpp:680: Failure Value of: statusRunning.get().state() Actual: TASK_FAILED Expected: TASK_RUNNING ../../src/tests/slave_tests.cpp:682: Failure Failed to wait 10secs for statusFinished ../../src/tests/slave_tests.cpp:673: Failure Actual function call count doesn't match EXPECT_CALL(sched, statusUpdate(driver, _))... Expected: to be called twice Actual: called once - unsatisfied and active [ FAILED ] SlaveTest.ROOT_RunTaskWithCommandInfoWithUser (10641 ms) [--] 1 test from SlaveTest (10641 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (10658 ms total) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)