date:20160307

[jira] [Commented] (MESOS-3243) Replace NULL with nullptr

2016-03-07 Thread Tomasz Janiszewski (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184618#comment-15184618
 ] 

Tomasz Janiszewski commented on MESOS-3243:
---

I can work on this

> Replace NULL with nullptr
> -
>
> Key: MESOS-3243
> URL: https://issues.apache.org/jira/browse/MESOS-3243
> Project: Mesos
>  Issue Type: Bug
>Reporter: Michael Park
>
> As part of the C++ upgrade, it would be nice to move our use of {{NULL}} over 
> to use {{nullptr}}. I think it would be an interesting exercise to do this 
> with {{clang-modernize}} using the [nullptr 
> transform|http://clang.llvm.org/extra/UseNullptrTransform.html] (although 
> it's probably just as easy to use {{sed}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4897) Update test cases to support PowerPC LE

2016-03-07 Thread Chen Zhiwei (JIRA)

Chen Zhiwei created MESOS-4897:
--

 Summary: Update test cases to support PowerPC LE
 Key: MESOS-4897
 URL: https://issues.apache.org/jira/browse/MESOS-4897
 Project: Mesos
  Issue Type: Improvement
Reporter: Chen Zhiwei
Assignee: Chen Zhiwei


Some docker related test cases will be failed on PowerPC LE, since the Docker 
image 'alpine' can't be able to run on PowerPC LE platform.

On PowerPC LE platform, the test cases can use Docker image 'ppc64le/busybox'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4848) Agent Authn Research Spike

2016-03-07 Thread Adam B (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184458#comment-15184458
 ] 

Adam B commented on MESOS-4848:
---

Looks great! I had a couple of questions that I left as comments in the doc, 
most importantly about the integration of authenticator modules.

> Agent Authn Research Spike
> --
>
> Key: MESOS-4848
> URL: https://issues.apache.org/jira/browse/MESOS-4848
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Research the master authentication flags to see what changes will be 
> necessary for agent http authentication.
> Write up a 1-2 page summary/design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4739) libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor

2016-03-07 Thread Neil Conway (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184410#comment-15184410
 ] 

Neil Conway commented on MESOS-4739:


Stress is http://people.seas.harvard.edu/~apw/stress/ -- i.e., just a workload 
generator that consumes a lot of CPU.

> libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor
> -
>
> Key: MESOS-4739
> URL: https://issues.apache.org/jira/browse/MESOS-4739
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: Neil Conway
>  Labels: flaky-test, libprocess, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/1704/consoleFull
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconnectHTTPExecutor
> I0223 04:54:28.547051   786 leveldb.cpp:174] Opened db in 124.456584ms
> I0223 04:54:28.597709   786 leveldb.cpp:181] Compacted db in 50.603402ms
> I0223 04:54:28.597779   786 leveldb.cpp:196] Created db iterator in 22429ns
> I0223 04:54:28.597797   786 leveldb.cpp:202] Seeked to beginning of db in 
> 2279ns
> I0223 04:54:28.597810   786 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 265ns
> I0223 04:54:28.597859   786 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0223 04:54:28.598731   807 recover.cpp:447] Starting replica recovery
> I0223 04:54:28.599493   807 recover.cpp:473] Replica is in EMPTY status
> I0223 04:54:28.601400   815 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (9593)@172.17.0.2:44225
> I0223 04:54:28.601776   818 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0223 04:54:28.602247   809 recover.cpp:564] Updating replica status to 
> STARTING
> I0223 04:54:28.603353   811 master.cpp:376] Master 
> 81a295fc-fe1b-4ff8-9291-cd54f5c6f303 (5847d87ad902) started on 
> 172.17.0.2:44225
> I0223 04:54:28.603376   811 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/f6d1qA/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/f6d1qA/master" --zk_session_timeout="10secs"
> I0223 04:54:28.603906   811 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0223 04:54:28.603920   811 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0223 04:54:28.603930   811 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/f6d1qA/credentials'
> I0223 04:54:28.604317   811 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0223 04:54:28.604506   811 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0223 04:54:28.604635   811 master.cpp:571] Authorization enabled
> I0223 04:54:28.604918   808 whitelist_watcher.cpp:77] No whitelist given
> I0223 04:54:28.605023   819 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0223 04:54:28.608273   812 master.cpp:1712] The newly elected leader is 
> master@172.17.0.2:44225 with id 81a295fc-fe1b-4ff8-9291-cd54f5c6f303
> I0223 04:54:28.608314   812 master.cpp:1725] Elected as the leading master!
> I0223 04:54:28.608333   812 master.cpp:1470] Recovering from registrar
> I0223 04:54:28.608610   812 registrar.cpp:307] Recovering registrar
> I0223 04:54:28.631079   817 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 28.524027ms
> I0223 04:54:28.631156   817 replica.cpp:320] Persisted replica status to 
> STARTING
> I0223 04:54:28.631431   810 recover.cpp:473] Replica is in STARTING status
> I0223 04:54:28.632550   819 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (9595)@172.17.0.2:44225
> I0223 04:54:28.632968   816 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0223 04:54:28.633414   807

[jira] [Commented] (MESOS-4739) libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor

2016-03-07 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184366#comment-15184366
 ] 

haosdent commented on MESOS-4739:
-

hi, [~neilc] What's {{stress --cpu 4}}? Seems gtest don't have a parameter like 
this.

> libprocess CHECK failure in SlaveRecoveryTest/0.ReconnectHTTPExecutor
> -
>
> Key: MESOS-4739
> URL: https://issues.apache.org/jira/browse/MESOS-4739
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: Neil Conway
>  Labels: flaky-test, libprocess, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/1704/consoleFull
> {code}
> [ RUN  ] SlaveRecoveryTest/0.ReconnectHTTPExecutor
> I0223 04:54:28.547051   786 leveldb.cpp:174] Opened db in 124.456584ms
> I0223 04:54:28.597709   786 leveldb.cpp:181] Compacted db in 50.603402ms
> I0223 04:54:28.597779   786 leveldb.cpp:196] Created db iterator in 22429ns
> I0223 04:54:28.597797   786 leveldb.cpp:202] Seeked to beginning of db in 
> 2279ns
> I0223 04:54:28.597810   786 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 265ns
> I0223 04:54:28.597859   786 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0223 04:54:28.598731   807 recover.cpp:447] Starting replica recovery
> I0223 04:54:28.599493   807 recover.cpp:473] Replica is in EMPTY status
> I0223 04:54:28.601400   815 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (9593)@172.17.0.2:44225
> I0223 04:54:28.601776   818 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0223 04:54:28.602247   809 recover.cpp:564] Updating replica status to 
> STARTING
> I0223 04:54:28.603353   811 master.cpp:376] Master 
> 81a295fc-fe1b-4ff8-9291-cd54f5c6f303 (5847d87ad902) started on 
> 172.17.0.2:44225
> I0223 04:54:28.603376   811 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/f6d1qA/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/f6d1qA/master" --zk_session_timeout="10secs"
> I0223 04:54:28.603906   811 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0223 04:54:28.603920   811 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0223 04:54:28.603930   811 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/f6d1qA/credentials'
> I0223 04:54:28.604317   811 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0223 04:54:28.604506   811 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0223 04:54:28.604635   811 master.cpp:571] Authorization enabled
> I0223 04:54:28.604918   808 whitelist_watcher.cpp:77] No whitelist given
> I0223 04:54:28.605023   819 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0223 04:54:28.608273   812 master.cpp:1712] The newly elected leader is 
> master@172.17.0.2:44225 with id 81a295fc-fe1b-4ff8-9291-cd54f5c6f303
> I0223 04:54:28.608314   812 master.cpp:1725] Elected as the leading master!
> I0223 04:54:28.608333   812 master.cpp:1470] Recovering from registrar
> I0223 04:54:28.608610   812 registrar.cpp:307] Recovering registrar
> I0223 04:54:28.631079   817 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 28.524027ms
> I0223 04:54:28.631156   817 replica.cpp:320] Persisted replica status to 
> STARTING
> I0223 04:54:28.631431   810 recover.cpp:473] Replica is in STARTING status
> I0223 04:54:28.632550   819 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (9595)@172.17.0.2:44225
> I0223 04:54:28.632968   816 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0223 04:54:28.633414   807 recover.cpp:564] Updating replica status to VOTING

[jira] [Commented] (MESOS-4800) SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky

2016-03-07 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184329#comment-15184329
 ] 

haosdent commented on MESOS-4800:
-

We have two approaches to fix this. One is add a check in
{code}
void StatusUpdateManagerProcess::resume()
{
  LOG(INFO) << "Resuming sending status updates";
  paused = false;
{code}

to avoid to resume StatusUpdateManagerProcess which is running.

Another is allow receive status update multiple times in test cases.
{code}
  EXPECT_CALL(sched, statusUpdate(_, _))
.WillOnce(FutureArg<1>());
{code}

> SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky
> --
>
> Key: MESOS-4800
> URL: https://issues.apache.org/jira/browse/MESOS-4800
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/1743/changes
> {code}
> [ RUN  ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
> I0229 02:11:01.321990  2124 leveldb.cpp:174] Opened db in 121.848194ms
> I0229 02:11:01.363880  2124 leveldb.cpp:181] Compacted db in 41.823665ms
> I0229 02:11:01.363965  2124 leveldb.cpp:196] Created db iterator in 27127ns
> I0229 02:11:01.363984  2124 leveldb.cpp:202] Seeked to beginning of db in 
> 3446ns
> I0229 02:11:01.363996  2124 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 332ns
> I0229 02:11:01.364050  2124 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0229 02:11:01.365196  2158 recover.cpp:447] Starting replica recovery
> I0229 02:11:01.365492  2158 recover.cpp:473] Replica is in EMPTY status
> I0229 02:11:01.366982  2151 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (9830)@172.17.0.3:36786
> I0229 02:11:01.367451  2149 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0229 02:11:01.368335  2149 recover.cpp:564] Updating replica status to 
> STARTING
> I0229 02:11:01.372730  2158 master.cpp:375] Master 
> d551df7b-0c69-4bc9-b113-eca605384c49 (3036a6611147) started on 
> 172.17.0.3:36786
> I0229 02:11:01.372764  2158 master.cpp:377] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/e9RAjp/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/e9RAjp/master" --zk_session_timeout="10secs"
> I0229 02:11:01.373164  2158 master.cpp:422] Master only allowing 
> authenticated frameworks to register
> I0229 02:11:01.373178  2158 master.cpp:427] Master only allowing 
> authenticated slaves to register
> I0229 02:11:01.373188  2158 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/e9RAjp/credentials'
> I0229 02:11:01.373612  2158 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0229 02:11:01.373793  2158 master.cpp:536] Using default 'basic' HTTP 
> authenticator
> I0229 02:11:01.373919  2158 master.cpp:570] Authorization enabled
> I0229 02:11:01.376322  2153 whitelist_watcher.cpp:77] No whitelist given
> I0229 02:11:01.376456  2158 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0229 02:11:01.378609  2144 master.cpp:1711] The newly elected leader is 
> master@172.17.0.3:36786 with id d551df7b-0c69-4bc9-b113-eca605384c49
> I0229 02:11:01.378674  2144 master.cpp:1724] Elected as the leading master!
> I0229 02:11:01.378700  2144 master.cpp:1469] Recovering from registrar
> I0229 02:11:01.378880  2154 registrar.cpp:307] Recovering registrar
> I0229 02:11:01.413949  2149 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 45.305096ms
> I0229 02:11:01.414049  2149 replica.cpp:320] Persisted replica status to 
> STARTING
> I0229 02:11:01.414481  2154 recover.cpp:473] Replica is in STARTING status
> I0229 02:11:01.416136  2154 replica.cpp:673] Replica in

[jira] [Commented] (MESOS-4800) SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky

2016-03-07 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184318#comment-15184318
 ] 

haosdent commented on MESOS-4800:
-

The problem here in fail case is it receive {{TASK_LOST}} before 
{{StatusUpdateManagerProcess::pause()}}. Then after resume, it resend 
{{TAKS_LOST}} twice.

{code}
// First time
I0229 02:11:02.602648  2154 status_update_manager.cpp:181] Resuming sending 
status updates
W0229 02:11:02.602721  2154 status_update_manager.cpp:188] Resending status 
update TASK_LOST (UUID: 9514b5e3-4a43-4593-b93f-d886e3791c84) for task 
6f4f1f8c-2649-4c70-9767-2ea122a79101 of framework 
d551df7b-0c69-4bc9-b113-eca605384c49-
I0229 02:11:02.602764  2154 status_update_manager.cpp:374] Forwarding update 
TASK_LOST (UUID: 9514b5e3-4a43-4593-b93f-d886e3791c84) for task 
6f4f1f8c-2649-4c70-9767-2ea122a79101 of framework 
d551df7b-0c69-4bc9-b113-eca605384c49- to the slave

// Second time.
I0229 02:11:02.602999  2154 status_update_manager.cpp:181] Resuming sending 
status updates
W0229 02:11:02.603032  2154 status_update_manager.cpp:188] Resending status 
update TASK_LOST (UUID: 9514b5e3-4a43-4593-b93f-d886e3791c84) for task 
6f4f1f8c-2649-4c70-9767-2ea122a79101 of framework 
d551df7b-0c69-4bc9-b113-eca605384c49-
I0229 02:11:02.603058  2154 status_update_manager.cpp:374] Forwarding update 
TASK_LOST (UUID: 9514b5e3-4a43-4593-b93f-d886e3791c84) for task 
6f4f1f8c-2649-4c70-9767-2ea122a79101 of framework 
d551df7b-0c69-4bc9-b113-eca605384c49- to the slave
{code}

> SlaveRecoveryTest/0.RecoverTerminatedExecutor is flaky
> --
>
> Key: MESOS-4800
> URL: https://issues.apache.org/jira/browse/MESOS-4800
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>  Labels: flaky, flaky-test, mesosphere
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)/1743/changes
> {code}
> [ RUN  ] SlaveRecoveryTest/0.RecoverTerminatedExecutor
> I0229 02:11:01.321990  2124 leveldb.cpp:174] Opened db in 121.848194ms
> I0229 02:11:01.363880  2124 leveldb.cpp:181] Compacted db in 41.823665ms
> I0229 02:11:01.363965  2124 leveldb.cpp:196] Created db iterator in 27127ns
> I0229 02:11:01.363984  2124 leveldb.cpp:202] Seeked to beginning of db in 
> 3446ns
> I0229 02:11:01.363996  2124 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 332ns
> I0229 02:11:01.364050  2124 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0229 02:11:01.365196  2158 recover.cpp:447] Starting replica recovery
> I0229 02:11:01.365492  2158 recover.cpp:473] Replica is in EMPTY status
> I0229 02:11:01.366982  2151 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (9830)@172.17.0.3:36786
> I0229 02:11:01.367451  2149 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0229 02:11:01.368335  2149 recover.cpp:564] Updating replica status to 
> STARTING
> I0229 02:11:01.372730  2158 master.cpp:375] Master 
> d551df7b-0c69-4bc9-b113-eca605384c49 (3036a6611147) started on 
> 172.17.0.3:36786
> I0229 02:11:01.372764  2158 master.cpp:377] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/e9RAjp/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/e9RAjp/master" --zk_session_timeout="10secs"
> I0229 02:11:01.373164  2158 master.cpp:422] Master only allowing 
> authenticated frameworks to register
> I0229 02:11:01.373178  2158 master.cpp:427] Master only allowing 
> authenticated slaves to register
> I0229 02:11:01.373188  2158 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/e9RAjp/credentials'
> I0229 02:11:01.373612  2158 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0229 02:11:01.373793  2158 master.cpp:536] Using default

[jira] [Created] (MESOS-4896) Update isolators dynamically

2016-03-07 Thread Guangya Liu (JIRA)

Guangya Liu created MESOS-4896:
--

 Summary: Update isolators dynamically
 Key: MESOS-4896
 URL: https://issues.apache.org/jira/browse/MESOS-4896
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu


Currently, when using DOCKER as image provider but not enabling docker/runtime, 
the agent will exit with a message: 
{code}
EXIT(1)
  << "Docker runtime isolator has to be specified if 'DOCKER' is included "
  << "in 'image_providers'. Please add 'docker/runtime' to '--isolation' "
  << "flags";
{code}

This will bring some trouble to operator cause s/he needs some manual 
operations, it is better to enable agent to add this isolator dynamically based 
on image provider.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4868) PersistentVolumeTests do not need to set up ACLs.

2016-03-07 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4868:
--
Shepherd: Adam B

> PersistentVolumeTests do not need to set up ACLs.
> -
>
> Key: MESOS-4868
> URL: https://issues.apache.org/jira/browse/MESOS-4868
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Joseph Wu
>Assignee: Yong Tang
>  Labels: mesosphere, newbie, test
>
> The {{PersistentVolumeTest}} s have a custom helper for setting up ACLs in 
> the {{master::Flags}}:
> {code}
> ACLs acls;
> hashset roles;
> foreach (const FrameworkInfo& framework, frameworks) {
>   mesos::ACL::RegisterFramework* acl = acls.add_register_frameworks();
>   acl->mutable_principals()->add_values(framework.principal());
>   acl->mutable_roles()->add_values(framework.role());
>   roles.insert(framework.role());
> }
> flags.acls = acls;
> flags.roles = strings::join(",", roles);
> {code}
> This is no longer necessary with implicit roles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4709) Enable compiler optimization by default

2016-03-07 Thread Benjamin Mahler (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184256#comment-15184256
 ] 

Benjamin Mahler commented on MESOS-4709:


Linked in MESOS-1985 for some context on why this was changed originally.

> Enable compiler optimization by default
> ---
>
> Key: MESOS-4709
> URL: https://issues.apache.org/jira/browse/MESOS-4709
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: autoconf, configure, mesosphere
>
> At present, Mesos defaults to compiling with "-O0"; to enable compiler
> optimizations, the user needs to specify "--enable-optimize" when running 
> {{configure}}.
> We should change the default for the following reasons:
> (1) The autoconf default for CFLAGS/CXXFLAGS is "-O2 -g". Anecdotally,
> I think most software packages compile with a reasonable level of
> optimizations enabled by default.
> (2) I think we should make the default configure flags appropriate for
> end-users (rather than Mesos developers): developers will be familiar
> enough with Mesos to tune the configure flags according to their own
> preferences.
> (3) The performance consequences of not enabling compiler
> optimizations can be pretty severe: 5x in a benchmark I just ran, and
> we've seen between 2x and 30x (!) performance differences for some
> real-world workloads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4895) Add more test cases to CommandExecutorTest

2016-03-07 Thread Guangya Liu (JIRA)

Guangya Liu created MESOS-4895:
--

 Summary: Add more test cases to CommandExecutorTest
 Key: MESOS-4895
 URL: https://issues.apache.org/jira/browse/MESOS-4895
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu


There is a new file named as 
https://github.com/apache/mesos/blob/master/src/tests/command_executor_tests.cpp
 was introduced for test case of command executor, but it is only covering some 
cases of task killing capability, it is better to add more test cases for this 
file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4890) FetcherCacheTest.LocalUncachedExtract and FetcherCacheHttpTest.HttpMixed fail as root on OSX

2016-03-07 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184237#comment-15184237
 ] 

haosdent commented on MESOS-4890:
-

Hi, [~greggomann] Does the {{/tmp}} directory permission correct in your 
machine? I try in OSX, it could pass.

> FetcherCacheTest.LocalUncachedExtract and FetcherCacheHttpTest.HttpMixed fail 
> as root on OSX
> 
>
> Key: MESOS-4890
> URL: https://issues.apache.org/jira/browse/MESOS-4890
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.27.1
> Environment: OSX 10.10.5
>Reporter: Greg Mann
>  Labels: mesosphere, tests
>
> These two tests are failing as root on OSX due to the same error:
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> I0307 13:18:53.177228 1928930048 leveldb.cpp:174] Opened db in 1694us
> I0307 13:18:53.177587 1928930048 leveldb.cpp:181] Compacted db in 332us
> I0307 13:18:53.177618 1928930048 leveldb.cpp:196] Created db iterator in 15us
> I0307 13:18:53.177633 1928930048 leveldb.cpp:202] Seeked to beginning of db 
> in 8us
> I0307 13:18:53.177644 1928930048 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 6us
> I0307 13:18:53.177690 1928930048 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0307 13:18:53.178393 218832896 recover.cpp:447] Starting replica recovery
> I0307 13:18:53.178628 218832896 recover.cpp:473] Replica is in EMPTY status
> I0307 13:18:53.179527 216686592 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (4)@127.0.0.1:49563
> I0307 13:18:53.179769 218832896 recover.cpp:193] Received a recover response 
> from a replica in EMPTY status
> I0307 13:18:53.179975 219906048 recover.cpp:564] Updating replica status to 
> STARTING
> I0307 13:18:53.180225 220442624 leveldb.cpp:304] Persisting metadata (8 
> bytes) to leveldb took 192us
> I0307 13:18:53.180249 220442624 replica.cpp:320] Persisted replica status to 
> STARTING
> I0307 13:18:53.180340 217223168 recover.cpp:473] Replica is in STARTING status
> I0307 13:18:53.180753 216686592 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (5)@127.0.0.1:49563
> I0307 13:18:53.180891 218832896 recover.cpp:193] Received a recover response 
> from a replica in STARTING status
> I0307 13:18:53.181082 216686592 recover.cpp:564] Updating replica status to 
> VOTING
> I0307 13:18:53.181246 218296320 leveldb.cpp:304] Persisting metadata (8 
> bytes) to leveldb took 100us
> I0307 13:18:53.181268 218296320 replica.cpp:320] Persisted replica status to 
> VOTING
> I0307 13:18:53.181325 217223168 recover.cpp:578] Successfully joined the 
> Paxos group
> I0307 13:18:53.181427 217223168 recover.cpp:462] Recover process terminated
> I0307 13:18:53.185133 218296320 master.cpp:375] Master 
> af5d4df4-703d-46f9-b5f7-95826f86abcd (localhost) started on 127.0.0.1:49563
> I0307 13:18:53.185169 218296320 master.cpp:377] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/private/tmp/BdBCVb/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/tmp/BdBCVb/master" --zk_session_timeout="10secs"
> W0307 13:18:53.185725 218296320 master.cpp:380]
> **
> Master bound to loopback interface! Cannot communicate with remote schedulers 
> or slaves. You might want to set '--ip' flag to a routable IP address.
> **
> I0307 13:18:53.185766 218296320 master.cpp:422] Master only allowing 
> authenticated frameworks to register
> I0307 13:18:53.185778 218296320 master.cpp:427] Master only allowing 
> authenticated slaves to register
> I0307 13:18:53.185784 218296320 credentials.hpp:35] Loading credentials for 
> authentication from '/private/tmp/BdBCVb/credentials'
> I0307 13:18:53.186089 218296320 master.cpp:467] Using default 'crammd5' 
> authenticator
>

[jira] [Updated] (MESOS-4795) mesos agent not recovering after ZK init failure

2016-03-07 Thread haosdent (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-4795:

Description: 
Here's the sequence of events that happened:

-Agent running fine with 0.24.1
-Transient ZK issues, slave flapping with zookeeper_init failure
-ZK issue resolved
-Most agents stop flapping and function correctly
-Some agents continue flapping, but silent exit after printing the 
detector.cpp:481 log line.
-The agents that continue to flap repaired with manual removal of contents in 
mesos-slave's working dir

Here's the contents of the various log files on the agent:

The .INFO logfile for one of the restarts before mesos-slave process exited 
with no other error messages:
{code}
Log file created at: 2016/02/09 02:12:48
Running on machine: titusagent-main-i-7697a9c5
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
I0209 02:12:48.502403 97255 logging.cpp:172] INFO level logging started!
I0209 02:12:48.502938 97255 main.cpp:185] Build: 2015-09-30 16:12:07 by builds
I0209 02:12:48.502974 97255 main.cpp:187] Version: 0.24.1
I0209 02:12:48.503288 97255 containerizer.cpp:143] Using isolation: 
posix/cpu,posix/mem,filesystem/posix
I0209 02:12:48.507961 97255 main.cpp:272] Starting Mesos slave
I0209 02:12:48.509827 97296 slave.cpp:190] Slave started on 
1)@10.138.146.230:7101
I0209 02:12:48.510074 97296 slave.cpp:191] Flags at startup: 
--appc_store_dir="/tmp/mesos/store/appc" --attributes="region:us-east-1;" 
--authenticatee="" --cgroups_cpu_enable_pids_and_tids_count="false" 
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" 
--cgroups_limit_swap="false" --cgroups_root="mesos" 
--container_disk_watch_interval="15secs" --containerizers="mesos" "
I0209 02:12:48.511706 97296 slave.cpp:354] Slave resources: 
ports(*):[7150-7200]; mem(*):240135; cpus(*):32; disk(*):586104
I0209 02:12:48.512320 97296 slave.cpp:384] Slave hostname: 
I0209 02:12:48.512368 97296 slave.cpp:389] Slave checkpoint: true
I0209 02:12:48.516139 97299 group.cpp:331] Group process 
(group(1)@10.138.146.230:7101) connected to ZooKeeper
I0209 02:12:48.516216 97299 group.cpp:805] Syncing group operations: queue size 
(joins, cancels, datas) = (0, 0, 0)
I0209 02:12:48.516253 97299 group.cpp:403] Trying to create path 
'/titus/main/mesos' in ZooKeeper
I0209 02:12:48.520268 97275 detector.cpp:156] Detected a new leader: (id='209')
I0209 02:12:48.520803 97284 group.cpp:674] Trying to get 
'/titus/main/mesos/json.info_000209' in ZooKeeper
I0209 02:12:48.520874 97278 state.cpp:54] Recovering state from 
'/mnt/data/mesos/meta'
I0209 02:12:48.520961 97278 state.cpp:690] Failed to find resources file 
'/mnt/data/mesos/meta/resources/resources.info'
I0209 02:12:48.523680 97283 detector.cpp:481] A new leading master 
(UPID=master@10.230.95.110:7103) is detected
{code}

The .FATAL log file when the original transient ZK error occurred:
{code}
Log file created at: 2016/02/05 17:21:37
Running on machine: titusagent-main-i-7697a9c5
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F0205 17:21:37.395644 53841 zookeeper.cpp:110] Failed to create ZooKeeper, 
zookeeper_init: No such file or directory [2]
{code}

The .ERROR log file:
{code}
Log file created at: 2016/02/05 17:21:37
Running on machine: titusagent-main-i-7697a9c5
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
F0205 17:21:37.395644 53841 zookeeper.cpp:110] Failed to create ZooKeeper, 
zookeeper_init: No such file or directory [2]
{code}
The .WARNING file had the same content. 

  was:
Here's the sequence of events that happened:

-Agent running fine with 0.24.1
-Transient ZK issues, slave flapping with zookeeper_init failure
-ZK issue resolved
-Most agents stop flapping and function correctly
-Some agents continue flapping, but silent exit after printing the 
detector.cpp:481 log line.
-The agents that continue to flap repaired with manual removal of contents in 
mesos-slave's working dir

Here's the contents of the various log files on the agent:

The .INFO logfile for one of the restarts before mesos-slave process exited 
with no other error messages:

Log file created at: 2016/02/09 02:12:48
Running on machine: titusagent-main-i-7697a9c5
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
I0209 02:12:48.502403 97255 logging.cpp:172] INFO level logging started!
I0209 02:12:48.502938 97255 main.cpp:185] Build: 2015-09-30 16:12:07 by builds
I0209 02:12:48.502974 97255 main.cpp:187] Version: 0.24.1
I0209 02:12:48.503288 97255 containerizer.cpp:143] Using isolation: 
posix/cpu,posix/mem,filesystem/posix
I0209 02:12:48.507961 97255 main.cpp:272] Starting Mesos slave
I0209 02:12:48.509827 97296 slave.cpp:190] Slave started on 
1)@10.138.146.230:7101
I0209 02:12:48.510074 97296 slave.cpp:191] Flags at startup: 
--appc_store_dir="/tmp/mesos/store/appc" --attributes="region:us-east-1;" 
--authenticatee=""

[jira] [Commented] (MESOS-4189) Dynamic weights

2016-03-07 Thread Yongqiao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184205#comment-15184205
 ] 

Yongqiao Wang commented on MESOS-4189:
--

OK, Thanks Adam. I will follow up those tasks ASAP. Cloud you help to review 
the RR #41681, #41790 and #43863? and in order to reduce the conflicts, let us 
commit them firstly, then it will be easily to do the following tasks based on 
that code base.

> Dynamic weights
> ---
>
> Key: MESOS-4189
> URL: https://issues.apache.org/jira/browse/MESOS-4189
> Project: Mesos
>  Issue Type: Epic
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>
> Mesos current uses a static list of weights that are configured when the 
> master startup(via the --weights flag), this place some limitation about 
> change the resource allocation priority for a role/frameworks(changing the 
> set of weights requires restarting all the masters). 
> This JIRA will add a new endpoint /weight to update/show weight of a role 
> with the authorized principles, and the non-default weights will be persisted 
> in registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4705) Slave failed to sample container with perf event

2016-03-07 Thread Benjamin Mahler (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-4705:
---
Shepherd: Benjamin Mahler

Sorry for the delay, thanks for looking into this! I left some comments on the 
review.

> Slave failed to sample container with perf event
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4840) Remove internal usage of deprecated ShutdownFramework ACL

2016-03-07 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4840:
--
Summary: Remove internal usage of deprecated ShutdownFramework ACL  (was: 
Remove ShutdownFramework from the ACLs messages and references)

> Remove internal usage of deprecated ShutdownFramework ACL
> -
>
> Key: MESOS-4840
> URL: https://issues.apache.org/jira/browse/MESOS-4840
> Project: Mesos
>  Issue Type: Task
>  Components: master, security, technical debt
>Affects Versions: 0.28.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>Priority: Minor
>  Labels: deprecation, mesosphere
>
> {{ShutdownFramework}} acl was deprecated a couple of versions ago in favor of 
> the {{TeardownFramework}} message. Its deprecation cycle came with 0.27. That 
> means we should remove the message and its references in the code base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4840) Remove ShutdownFramework from the ACLs messages and references

2016-03-07 Thread Adam B (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184144#comment-15184144
 ] 

Adam B commented on MESOS-4840:
---

Done. I'm running this through CI, then I'm ready to commit it.

> Remove ShutdownFramework from the ACLs messages and references
> --
>
> Key: MESOS-4840
> URL: https://issues.apache.org/jira/browse/MESOS-4840
> Project: Mesos
>  Issue Type: Task
>  Components: master, security, technical debt
>Affects Versions: 0.28.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>Priority: Minor
>  Labels: deprecation, mesosphere
>
> {{ShutdownFramework}} acl was deprecated a couple of versions ago in favor of 
> the {{TeardownFramework}} message. Its deprecation cycle came with 0.27. That 
> means we should remove the message and its references in the code base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4840) Remove ShutdownFramework from the ACLs messages and references

2016-03-07 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184133#comment-15184133
 ] 

Vinod Kone commented on MESOS-4840:
---

Can you add it to sprint/add shepherd/add story points ?

> Remove ShutdownFramework from the ACLs messages and references
> --
>
> Key: MESOS-4840
> URL: https://issues.apache.org/jira/browse/MESOS-4840
> Project: Mesos
>  Issue Type: Task
>  Components: master, security, technical debt
>Affects Versions: 0.28.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>Priority: Minor
>  Labels: deprecation, mesosphere
>
> {{ShutdownFramework}} acl was deprecated a couple of versions ago in favor of 
> the {{TeardownFramework}} message. Its deprecation cycle came with 0.27. That 
> means we should remove the message and its references in the code base.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4893) Allow setting permissions and access control on persistent volumes

2016-03-07 Thread Anindya Sinha (JIRA)

Anindya Sinha created MESOS-4893:


 Summary: Allow setting permissions and access control on 
persistent volumes
 Key: MESOS-4893
 URL: https://issues.apache.org/jira/browse/MESOS-4893
 Project: Mesos
  Issue Type: Improvement
  Components: general
Reporter: Anindya Sinha
Assignee: Anindya Sinha


Currently, persistent volumes are exclusive, i.e. that if a persistent volume 
is used by one task or executor, it cannot be concurrently used by other task 
or executor. 
With the introduction of shared volumes, persistent volumes can be used 
simultaneously by multiple tasks or executors. As a result, we need to 
introduce setting up of ownership of persistent volumes at creation of volumes 
which the tasks need to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4892) Support arithmetic operations for shared resources with consumer counts

2016-03-07 Thread Anindya Sinha (JIRA)

Anindya Sinha created MESOS-4892:


 Summary: Support arithmetic operations for shared resources with 
consumer counts
 Key: MESOS-4892
 URL: https://issues.apache.org/jira/browse/MESOS-4892
 Project: Mesos
  Issue Type: Improvement
  Components: general
Reporter: Anindya Sinha
Assignee: Anindya Sinha


With the introduction of shared resources, we need to add support for 
arithmetic operations on Resources which perform such operations on shared 
resources. Shared resources need to be handled differently so as to account for 
incrementing/decrementing consumer counts maintained by each Resources object.

Case 1:
Resources total += shared_resource;

If shared_resource exists in total, this would imply that the consumer count is 
incremented. If shared_resource does not exist in total, this would imply we 
start tracking consumers for this shared resource initialized to 0 consumers.

Case 2
Resources total -= shared_resource;

If shared_resource exists in total, this would imply that the consumer count is 
decremented. However, the shared_resource is removed from total if the consumer 
count is originally 0 in total).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.

2016-03-07 Thread Sargun Dhillon (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184017#comment-15184017
 ] 

Sargun Dhillon commented on MESOS-4891:
---

Can we also have a place to list all executor PIDs that are associated with 
those containers?

> Add a '/containers' endpoint to the agent to list all the active containers.
> 
>
> Key: MESOS-4891
> URL: https://issues.apache.org/jira/browse/MESOS-4891
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> This endpoint will be similar to /monitor/statistics.json endpoint, but it'll 
> also contain the 'container_status' about the container (see ContainerStatus 
> in mesos.proto). We'll eventually deprecate the /monitor/statistics.json 
> endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4740) Improve master metrics/snapshot performace

2016-03-07 Thread Cong Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Wang updated MESOS-4740:
-
Description: 
[~drobinson] noticed retrieving metrics/snapshot statistics could be very 
inefficient.

{noformat}
[user@server ~]$ time curl -s localhost:5050/metrics/snapshot

real0m35.654s
user0m0.019s
sys 0m0.011s
{noformat}

MESOS-1287 introduces a timeout parameter for this query, but for 
metric-collectors like ours they are not aware of such URL-specific parameter, 
so we need:

1) We should always have a timeout and set some default value to it

2) Investigate why master metrics/snapshot could take such a long time to 
complete under load.


  was:
[~drobinson] noticed retrieving metrics/snapshot statistics could be very 
inefficient.

{noformat}
[user@server ~]$ time curl -s localhost:5050/metrics/snapshot

real0m35.654s
user0m0.019s
sys 0m0.011s
{noformat}

MESOS-1287 introduces a timeout parameter for this query, but for 
metric-collectors like ours they are not aware of such URL-specific parameter, 
so we need:

1) We should always have a timeout and set some default value to it

2) Investigate why metrics/snapshot could take such a long time to complete 
under load, since we don't use history for these statistics and the values are 
just some atomic read.



> Improve master metrics/snapshot performace
> --
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very 
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real  0m35.654s
> user  0m0.019s
> sys   0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for 
> metric-collectors like ours they are not aware of such URL-specific 
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why master metrics/snapshot could take such a long time to 
> complete under load.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4740) Improve master metrics/snapshot performace

2016-03-07 Thread Cong Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Wang updated MESOS-4740:
-
Summary: Improve master metrics/snapshot performace  (was: Improve 
metrics/snapshot performace)

> Improve master metrics/snapshot performace
> --
>
> Key: MESOS-4740
> URL: https://issues.apache.org/jira/browse/MESOS-4740
> Project: Mesos
>  Issue Type: Task
>Reporter: Cong Wang
>Assignee: Cong Wang
>
> [~drobinson] noticed retrieving metrics/snapshot statistics could be very 
> inefficient.
> {noformat}
> [user@server ~]$ time curl -s localhost:5050/metrics/snapshot
> real  0m35.654s
> user  0m0.019s
> sys   0m0.011s
> {noformat}
> MESOS-1287 introduces a timeout parameter for this query, but for 
> metric-collectors like ours they are not aware of such URL-specific 
> parameter, so we need:
> 1) We should always have a timeout and set some default value to it
> 2) Investigate why metrics/snapshot could take such a long time to complete 
> under load, since we don't use history for these statistics and the values 
> are just some atomic read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2840) MesosContainerizer support multiple image provisioners

2016-03-07 Thread Ian Downes (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2840:
--
Description: 
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]  < please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

-[[original 
document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
 requires permission]-

  was:
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]  < please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

[[original 
document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
 requires permission]


> MesosContainerizer support multiple image provisioners
> --
>
> Key: MESOS-2840
> URL: https://issues.apache.org/jira/browse/MESOS-2840
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, docker
>Affects Versions: 0.23.0
>Reporter: Marco Massenzio
>Assignee: Timothy Chen
>  Labels: mesosphere, twitter
>
> We want to utilize the Appc integration interfaces to further make 
> MesosContainerizers to support multiple image formats.
> This allows our future work on isolators to support any container image 
> format.
> Design
> [open to public comments]  < please use this document!
> https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing
> -[[original 
> document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
>  requires permission]-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2840) MesosContainerizer support multiple image provisioners

2016-03-07 Thread Ian Downes (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2840:
--
Description: 
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]   please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

-[[original 
document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
 requires permission]-

  was:
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]  < please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

-[[original 
document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
 requires permission]-


> MesosContainerizer support multiple image provisioners
> --
>
> Key: MESOS-2840
> URL: https://issues.apache.org/jira/browse/MESOS-2840
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, docker
>Affects Versions: 0.23.0
>Reporter: Marco Massenzio
>Assignee: Timothy Chen
>  Labels: mesosphere, twitter
>
> We want to utilize the Appc integration interfaces to further make 
> MesosContainerizers to support multiple image formats.
> This allows our future work on isolators to support any container image 
> format.
> Design
> [open to public comments]   please use this document!
> https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing
> -[[original 
> document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
>  requires permission]-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.

2016-03-07 Thread Jie Yu (JIRA)

Jie Yu created MESOS-4891:
-

 Summary: Add a '/containers' endpoint to the agent to list all the 
active containers.
 Key: MESOS-4891
 URL: https://issues.apache.org/jira/browse/MESOS-4891
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu


This endpoint will be similar to /monitor/statistics.json endpoint, but it'll 
also contain the 'container_status' about the container (see ContainerStatus in 
mesos.proto). We'll eventually deprecate the /monitor/statistics.json endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2840) MesosContainerizer support multiple image provisioners

2016-03-07 Thread Ian Downes (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2840:
--
Description: 
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]  < please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

[original document, requires permission]
-https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing-

  was:
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

[original document, requires permission]
https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing


> MesosContainerizer support multiple image provisioners
> --
>
> Key: MESOS-2840
> URL: https://issues.apache.org/jira/browse/MESOS-2840
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, docker
>Affects Versions: 0.23.0
>Reporter: Marco Massenzio
>Assignee: Timothy Chen
>  Labels: mesosphere, twitter
>
> We want to utilize the Appc integration interfaces to further make 
> MesosContainerizers to support multiple image formats.
> This allows our future work on isolators to support any container image 
> format.
> Design
> [open to public comments]  < please use this document!
> https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing
> [original document, requires permission]
> -https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2840) MesosContainerizer support multiple image provisioners

2016-03-07 Thread Ian Downes (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2840:
--
Description: 
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]  < please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

[[original 
document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
 requires permission]

  was:
We want to utilize the Appc integration interfaces to further make 
MesosContainerizers to support multiple image formats.
This allows our future work on isolators to support any container image format.

Design
[open to public comments]  < please use this document!
https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing

[original document, requires permission]
-https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing-


> MesosContainerizer support multiple image provisioners
> --
>
> Key: MESOS-2840
> URL: https://issues.apache.org/jira/browse/MESOS-2840
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization, docker
>Affects Versions: 0.23.0
>Reporter: Marco Massenzio
>Assignee: Timothy Chen
>  Labels: mesosphere, twitter
>
> We want to utilize the Appc integration interfaces to further make 
> MesosContainerizers to support multiple image formats.
> This allows our future work on isolators to support any container image 
> format.
> Design
> [open to public comments]  < please use this document!
> https://docs.google.com/document/d/1oUpJNjJ0l51fxaYut21mKPwJUiAcAdgbdF7SAdAW2PA/edit?usp=sharing
> [[original 
> document|https://docs.google.com/a/twitter.com/document/d/1Fx5TS0LytV7u5MZExQS0-g-gScX2yKCKQg9UPFzhp6U/edit?usp=sharing]
>  requires permission]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4883) Add agent ID to agent state endpoint

2016-03-07 Thread Sargun Dhillon (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183895#comment-15183895
 ] 

Sargun Dhillon commented on MESOS-4883:
---

We have a tool here that looks at the agent state.json and assembles a complete 
cluster view based on the sum of the agent JSONs. This system is soft-state. If 
the last state we have in memory has a bunch of tasks associated with this 
slave, and this slave comes back and runs for a while (10m), and we don't see 
any tasks, we do not know to take those old tasks and remove them from the 
system.

> Add agent ID to agent state endpoint
> 
>
> Key: MESOS-4883
> URL: https://issues.apache.org/jira/browse/MESOS-4883
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> I would like to have the slave ID exposed on the slave before any tasks are 
> running on the slave on the state.json endpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4492) Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation

2016-03-07 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4492:
-
Shepherd: Jie Yu

> Add metrics for {RESERVE, UNRESERVE} and {CREATE, DESTROY} offer operation
> --
>
> Key: MESOS-4492
> URL: https://issues.apache.org/jira/browse/MESOS-4492
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Fan Du
>Assignee: Fan Du
>Priority: Minor
>
> This ticket aims to enable user or operator to inspect operation statistics 
> such as RESERVE, UNRESERVE, CREATE and DESTROY, current implementation only 
> supports LAUNCH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4849) Add agent flags for HTTP authentication

2016-03-07 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4849:
-
Shepherd: Adam B

> Add agent flags for HTTP authentication
> ---
>
> Key: MESOS-4849
> URL: https://issues.apache.org/jira/browse/MESOS-4849
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Flags should be added to the agent to:
> 1. Enable HTTP authentication ({{--authenticate_http}})
> 2. Specify credentials ({{--http_credentials}})
> 3. Specify HTTP authenticators ({{--authenticators}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4848) Agent Authn Research Spike

2016-03-07 Thread Greg Mann (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4848:
-
Shepherd: Adam B

> Agent Authn Research Spike
> --
>
> Key: MESOS-4848
> URL: https://issues.apache.org/jira/browse/MESOS-4848
> Project: Mesos
>  Issue Type: Task
>  Components: security, slave
>Reporter: Adam B
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Research the master authentication flags to see what changes will be 
> necessary for agent http authentication.
> Write up a 1-2 page summary/design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3302) Scheduler API v1 improvements

2016-03-07 Thread Vinod Kone (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3302:
--
Assignee: (was: Marco Massenzio)

> Scheduler API v1 improvements
> -
>
> Key: MESOS-3302
> URL: https://issues.apache.org/jira/browse/MESOS-3302
> Project: Mesos
>  Issue Type: Epic
>Reporter: Marco Massenzio
>  Labels: mesosphere, twitter
>
> This Epic covers all the refinements that we may want to build on top of the 
> {{HTTP API}} MVP epic (MESOS-2288) which was released initially with Mesos 
> {{0.24.0}}.
> The tasks/stories here cover the necessary work to bring the API v1 to what 
> we would regard as "Production-ready" state in preparation for the {{1.0.0}} 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4890) FetcherCacheTest.LocalUncachedExtract and FetcherCacheHttpTest.HttpMixed fail as root on OSX

2016-03-07 Thread Greg Mann (JIRA)

Greg Mann created MESOS-4890:


 Summary: FetcherCacheTest.LocalUncachedExtract and 
FetcherCacheHttpTest.HttpMixed fail as root on OSX
 Key: MESOS-4890
 URL: https://issues.apache.org/jira/browse/MESOS-4890
 Project: Mesos
  Issue Type: Bug
  Components: tests
Affects Versions: 0.27.1
 Environment: OSX 10.10.5
Reporter: Greg Mann


These two tests are failing as root on OSX due to the same error:

{code}
[ RUN  ] FetcherCacheTest.LocalUncachedExtract
I0307 13:18:53.177228 1928930048 leveldb.cpp:174] Opened db in 1694us
I0307 13:18:53.177587 1928930048 leveldb.cpp:181] Compacted db in 332us
I0307 13:18:53.177618 1928930048 leveldb.cpp:196] Created db iterator in 15us
I0307 13:18:53.177633 1928930048 leveldb.cpp:202] Seeked to beginning of db in 
8us
I0307 13:18:53.177644 1928930048 leveldb.cpp:271] Iterated through 0 keys in 
the db in 6us
I0307 13:18:53.177690 1928930048 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0307 13:18:53.178393 218832896 recover.cpp:447] Starting replica recovery
I0307 13:18:53.178628 218832896 recover.cpp:473] Replica is in EMPTY status
I0307 13:18:53.179527 216686592 replica.cpp:673] Replica in EMPTY status 
received a broadcasted recover request from (4)@127.0.0.1:49563
I0307 13:18:53.179769 218832896 recover.cpp:193] Received a recover response 
from a replica in EMPTY status
I0307 13:18:53.179975 219906048 recover.cpp:564] Updating replica status to 
STARTING
I0307 13:18:53.180225 220442624 leveldb.cpp:304] Persisting metadata (8 bytes) 
to leveldb took 192us
I0307 13:18:53.180249 220442624 replica.cpp:320] Persisted replica status to 
STARTING
I0307 13:18:53.180340 217223168 recover.cpp:473] Replica is in STARTING status
I0307 13:18:53.180753 216686592 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from (5)@127.0.0.1:49563
I0307 13:18:53.180891 218832896 recover.cpp:193] Received a recover response 
from a replica in STARTING status
I0307 13:18:53.181082 216686592 recover.cpp:564] Updating replica status to 
VOTING
I0307 13:18:53.181246 218296320 leveldb.cpp:304] Persisting metadata (8 bytes) 
to leveldb took 100us
I0307 13:18:53.181268 218296320 replica.cpp:320] Persisted replica status to 
VOTING
I0307 13:18:53.181325 217223168 recover.cpp:578] Successfully joined the Paxos 
group
I0307 13:18:53.181427 217223168 recover.cpp:462] Recover process terminated
I0307 13:18:53.185133 218296320 master.cpp:375] Master 
af5d4df4-703d-46f9-b5f7-95826f86abcd (localhost) started on 127.0.0.1:49563
I0307 13:18:53.185169 218296320 master.cpp:377] Flags at startup: --acls="" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
--authenticators="crammd5" --authorizers="local" 
--credentials="/private/tmp/BdBCVb/credentials" --framework_sorter="drf" 
--help="false" --hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
--quiet="false" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="100secs" --registry_strict="true" 
--root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/private/tmp/BdBCVb/master" --zk_session_timeout="10secs"
W0307 13:18:53.185725 218296320 master.cpp:380]
**
Master bound to loopback interface! Cannot communicate with remote schedulers 
or slaves. You might want to set '--ip' flag to a routable IP address.
**
I0307 13:18:53.185766 218296320 master.cpp:422] Master only allowing 
authenticated frameworks to register
I0307 13:18:53.185778 218296320 master.cpp:427] Master only allowing 
authenticated slaves to register
I0307 13:18:53.185784 218296320 credentials.hpp:35] Loading credentials for 
authentication from '/private/tmp/BdBCVb/credentials'
I0307 13:18:53.186089 218296320 master.cpp:467] Using default 'crammd5' 
authenticator
I0307 13:18:53.186130 218296320 authenticator.cpp:518] Initializing server SASL
I0307 13:18:53.204093 218296320 master.cpp:536] Using default 'basic' HTTP 
authenticator
I0307 13:18:53.204290 218296320 master.cpp:570] Authorization enabled
I0307 13:18:53.207252 216686592 master.cpp:1711] The newly elected leader is 
master@127.0.0.1:49563 with id af5d4df4-703d-46f9-b5f7-95826f86abcd
I0307 13:18:53.207278 216686592 master.cpp:1724] Elected as the leading master!
I0307 13:18:53.207285 216686592 master.cpp:1469] Recovering from registrar

[jira] [Created] (MESOS-4889) Implement runtime isolator tests.

2016-03-07 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4889:
---

 Summary: Implement runtime isolator tests.
 Key: MESOS-4889
 URL: https://issues.apache.org/jira/browse/MESOS-4889
 Project: Mesos
  Issue Type: Task
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


There different cases in docker runtime isolator. Some special cases should be 
tested with unique test case, to verify the docker runtime isolator logic is 
correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4813) Implement base tests for unified container using local puller.

2016-03-07 Thread Gilbert Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-4813:

Summary: Implement base tests for unified container using local puller.  
(was: Implement base tests for unified container using local registry.)

> Implement base tests for unified container using local puller.
> --
>
> Key: MESOS-4813
> URL: https://issues.apache.org/jira/browse/MESOS-4813
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: containerizer
>
> Using command line executor to test shell commands with local docker images.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4888) Default cmd is executed as an incorrect command.

2016-03-07 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4888:
---

 Summary: Default cmd is executed as an incorrect command.
 Key: MESOS-4888
 URL: https://issues.apache.org/jira/browse/MESOS-4888
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


When mesos containerizer launch a container using a docker image, which only 
container default Cmd. The executable command is is a incorrect sequence. For 
example:

If an image default entrypoint is null, cmd is "sh", user defines shell=false, 
value is none, and arguments as [-c, echo 'hello world']. The executable 
command is `[sh, -c, echo 'hello world', sh]`, which is incorrect. It should be 
`[sh, sh, -c, echo 'hello world']` instead.

This problem is only exposed for the case: sh=0, value=0, argv=1, entrypoint=0, 
cmd=1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4126) Construct the error string in `MethodNotAllowed`.

2016-03-07 Thread Alexander Rukletsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4126:
---
Shepherd: Alexander Rukletsov

> Construct the error string in `MethodNotAllowed`.
> -
>
> Key: MESOS-4126
> URL: https://issues.apache.org/jira/browse/MESOS-4126
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Jacob Janco
>  Labels: http, mesosphere, newbie++
>
> Consider constructing the error string in {{MethodNotAllowed}} rather than at 
> the invocation site. Currently we want all error messages follow the same 
> pattern, so instead of writing
> {code}
> return MethodNotAllowed({"POST"}, "Expecting 'POST', received '" + 
> request.method + "'");
> {code}
> we can write something like
> {code}
> MethodNotAllowed({"POST"}, request.method)`
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4883) Add agent ID to agent state endpoint

2016-03-07 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183676#comment-15183676
 ] 

Vinod Kone commented on MESOS-4883:
---

Can you provide more context/motivation?

> Add agent ID to agent state endpoint
> 
>
> Key: MESOS-4883
> URL: https://issues.apache.org/jira/browse/MESOS-4883
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: mesosphere
>
> I would like to have the slave ID exposed on the slave before any tasks are 
> running on the slave on the state.json endpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4887) Design doc for Slave/Agent rename

2016-03-07 Thread Vinod Kone (JIRA)

Vinod Kone created MESOS-4887:
-

 Summary: Design doc for Slave/Agent rename
 Key: MESOS-4887
 URL: https://issues.apache.org/jira/browse/MESOS-4887
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Diana Arroyo


Design doc: 

https://docs.google.com/document/d/1P8_4wdk29I6NoVTjbFkRl05-tfxV9PY4WLoRNvExupM/edit#heading=h.9g7fqjh6652v



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4886) Support mesos containerizer force_pull_image option.

2016-03-07 Thread Gilbert Song (JIRA)

Gilbert Song created MESOS-4886:
---

 Summary: Support mesos containerizer force_pull_image option.
 Key: MESOS-4886
 URL: https://issues.apache.org/jira/browse/MESOS-4886
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Gilbert Song
Assignee: Gilbert Song


Currently for unified containerizer, images that are already cached by metadata 
manager cannot be updated. User has to delete corresponding images in store if 
an update is need. We should support `force_pull_image` option for unified 
containerizer, to provide override option if existed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4885) Unzip should force overwrite

2016-03-07 Thread Tomasz Janiszewski (JIRA)

Tomasz Janiszewski created MESOS-4885:
-

 Summary: Unzip should force overwrite
 Key: MESOS-4885
 URL: https://issues.apache.org/jira/browse/MESOS-4885
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Reporter: Tomasz Janiszewski
Priority: Trivial


Consider situation when zip file is malformed and contains duplicated files . 
When fetcher downloads malformed zip file, that contains duplicated files 
(e.g., dist zips generated by gradle could have duplicated files in libs dir) 
and try to uncompress it, deployment hang in staged phase because unzip prompt 
if file should be replaced. unzip should overrite this file or break with error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-07 Thread Robert Brockbank (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183425#comment-15183425
 ] 

Robert Brockbank edited comment on MESOS-4370 at 3/7/16 6:41 PM:
-

I think getting better support for the --net option is a separate issue.

At the moment, service discovery/DNS does not work with Docker 1.10 because of 
the relocation of the IP field in the inspection data.  This issue *does* 
resolve that and I don't think we should be holding off on getting this into a 
release.  It is not necessary to wait until we have improved --net support.

As it stands today, specifying an additional --net option does work as a 
mechanism for using user-defined networks, and people are using it.  I agree 
that the UX isn't ideal and we should aim to improve that, but DNS is actually 
broken and we do have a simple fix for that which could go in.


was (Author: robbrockb...@gmail.com):
I think getting better support for the --net option is a separate issue.

At the moment, service discovery/DNS does not work with Docker 1.10 because of 
the relocation of the IP field in the inspection data.  This issue *does* 
resolve that and I don't think we should be holding off on getting this into a 
release.  It is not necessary to wait until we have improved --net support.

As it stands the --net option does work today as a mechanism for using 
user-defined networks, and people are using it.  I agree that the UX isn't 
ideal and we should aim to improve that, but DNS is actually broken and we do 
have a simple fix for that which could go in.

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4884) Ensure task_status timestamp is monotonic

2016-03-07 Thread Sargun Dhillon (JIRA)

Sargun Dhillon created MESOS-4884:
-

 Summary: Ensure task_status timestamp is monotonic
 Key: MESOS-4884
 URL: https://issues.apache.org/jira/browse/MESOS-4884
 Project: Mesos
  Issue Type: Improvement
Reporter: Sargun Dhillon
Priority: Critical


In state.json the task status has a timestamp associated with it. From my 
understanding, the timestamp is when the task status update was generated. 
Although the slave guarantees that the list is sorted, and the first item of 
the list is the newest status. This becomes a problem if someone is 
independently getting the task status updates -- without the logic in the 
slave, we cannot determine the current state of the task. 

There exists a timestamp on the task. I would like the executor (API) to ensure 
that this timestamp is strictly monotonic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-07 Thread Dan Osborne (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183423#comment-15183423
 ] 

Dan Osborne commented on MESOS-4370:


I don't believe this is entirely true. Regardless of whether the launched 
container is using a User-defined Network or a regular docker networked 
container, the place that Mesos expects to find the IP has been moved in the 
docker api. Your fix addresses that, and restores Mesos ability to get the IP. 

Though RA42516 also concerns networking, I don't think it should prevent this 
from getting merged, as this will restore DNS and service discovery in Mesos 
for Docker 1.10+

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4883) Add agent ID to agent state endpoint

2016-03-07 Thread Sargun Dhillon (JIRA)

Sargun Dhillon created MESOS-4883:
-

 Summary: Add agent ID to agent state endpoint
 Key: MESOS-4883
 URL: https://issues.apache.org/jira/browse/MESOS-4883
 Project: Mesos
  Issue Type: Improvement
Reporter: Sargun Dhillon
Priority: Minor


I would like to have the slave ID exposed on the slave before any tasks are 
running on the slave on the state.json endpoint. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-07 Thread Travis Hegner (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183395#comment-15183395
 ] 

Travis Hegner commented on MESOS-4370:
--

Thank you [~robbrockb...@gmail.com] for you testing and interest in this patch. 
I've discovered that this patch only works out of pure luck in the way docker 
interprets multiple "--net" parameters. I have been stalling this patch as it 
will have to be re-worked to account for official user defined network support 
in mesos, via https://reviews.apache.org/r/42516/.

I'd be happy to get a working fix merged in myself, but would prefer it be 
based on the patch linked above.

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-07 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183358#comment-15183358
 ] 

haosdent commented on MESOS-4370:
-

+1 for this. Docker 1.10.0 have already release more than 1 month.

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-07 Thread Robert Brockbank (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183319#comment-15183319
 ] 

Robert Brockbank edited comment on MESOS-4370 at 3/7/16 5:34 PM:
-

Is it possible to get this fix in the latest patch (0.28?).  At the moment 
MesosDNS is not working when using the Docker Containerizer with Docker 1.10.1. 
 We've tested a patch containing this fix and IP discovery and MesosDNS both 
then work as expected.

Really keen to get this in a patch as soon as possible.


was (Author: robbrockb...@gmail.com):
Is it possible to get this fix in the latest patch (0.28?).  At the moment 
MesosDNS is not working when using the Docker Containerizer with Docker 1.10.1 
w(with user defined networks).  We've tested a patch containing this fix and IP 
discovery and MesosDNS both then work as expected.

Really keen to get this in a patch as soon as possible.

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-03-07 Thread Robert Brockbank (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183319#comment-15183319
 ] 

Robert Brockbank commented on MESOS-4370:
-

Is it possible to get this fix in the latest patch (0.28?).  At the moment 
MesosDNS is not working when using the Docker Containerizer with Docker 1.10.1 
w(with user defined networks).  We've tested a patch containing this fix and IP 
discovery and MesosDNS both then work as expected.

Really keen to get this in a patch as soon as possible.

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4279) Graceful restart of docker task

2016-03-07 Thread Martin Bydzovsky (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183205#comment-15183205
 ] 

Martin Bydzovsky commented on MESOS-4279:
-

Are you sure [~qianzhang] you tried exactly {{vagrant up}} and then restart the 
app (marathon api/ui)? Because now I started digging and adding custom logs in 
the mesos codebase and recompile it around and around. And to me, the code 
seems like it had never ever worked.

https://github.com/apache/mesos/blob/0.26.0/src/docker/executor.cpp#L219 - Just 
immediately after calling the docker->stop (with correct value btw - as I've 
inspected) you set {{killed=true}} and then, in the {{reaped}} method (which 
gets called immediately, you check for the {{killed}} flag to send wrong 
TASK_KILLED status update: 
https://github.com/apache/mesos/blob/0.26.0/src/docker/executor.cpp#L281. 

Finally, 
https://github.com/apache/mesos/blob/0.26.0/src/docker/executor.cpp#L308 stops 
the whole driver - which im not sure yet what that really means - but if thats 
the parent process of the docker executor, then it will kill the {{docker run}} 
process in a cascade.

> Graceful restart of docker task
> ---
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
>Reporter: Martin Bydzovsky
>Assignee: Qian Zhang
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-4818) Add end to end testing for Appc images.

2016-03-07 Thread Jojy Varghese (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jojy Varghese updated MESOS-4818:
-
Sprint: Mesosphere Sprint 30

> Add end to end testing for Appc images.
> ---
>
> Key: MESOS-4818
> URL: https://issues.apache.org/jira/browse/MESOS-4818
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jojy Varghese
>Assignee: Jojy Varghese
>  Labels: mesosphere, unified-containerizer-mvp
>
> Add tests that covers integration test of the Appc provisioner feature with 
> mesos containerizer.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3815) docker executor not works when SSL enable

2016-03-07 Thread Kevin Cox (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182920#comment-15182920
 ] 

Kevin Cox commented on MESOS-3815:
--

It's also worth nothing that this affects non-docker executors as well. 

> docker executor not works when SSL enable
> -
>
> Key: MESOS-3815
> URL: https://issues.apache.org/jira/browse/MESOS-3815
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>  Labels: docker, encryption, mesosphere, security, ssl
>
> Because docker executor not pass SSL related environment variables, 
> mesos-docker-executor could not works normal when SSL enable. More details 
> could found in http://search-hadoop.com/m/0Vlr6DsslDSvVs72



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-4279) Graceful restart of docker task

2016-03-07 Thread AoJ (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182864#comment-15182864
 ] 

AoJ commented on MESOS-4279:


hi [~qianzhang],

I fell into the same problem. I have clean ubuntu 14.04. Do you have any idea 
why this is happening and where it can be a problem?

I tried to use attached vagrantfile uploaded by [~bydga] and it didn't work too.

Tomas

> Graceful restart of docker task
> ---
>
> Key: MESOS-4279
> URL: https://issues.apache.org/jira/browse/MESOS-4279
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0
>Reporter: Martin Bydzovsky
>Assignee: Qian Zhang
>
> I'm implementing a graceful restarts of our mesos-marathon-docker setup and I 
> came to a following issue:
> (it was already discussed on 
> https://github.com/mesosphere/marathon/issues/2876 and guys form mesosphere 
> got to a point that its probably a docker containerizer problem...)
> To sum it up:
> When i deploy simple python script to all mesos-slaves:
> {code}
> #!/usr/bin/python
> from time import sleep
> import signal
> import sys
> import datetime
> def sigterm_handler(_signo, _stack_frame):
> print "got %i" % _signo
> print datetime.datetime.now().time()
> sys.stdout.flush()
> sleep(2)
> print datetime.datetime.now().time()
> print "ending"
> sys.stdout.flush()
> sys.exit(0)
> signal.signal(signal.SIGTERM, sigterm_handler)
> signal.signal(signal.SIGINT, sigterm_handler)
> try:
> print "Hello"
> i = 0
> while True:
> i += 1
> print datetime.datetime.now().time()
> print "Iteration #%i" % i
> sys.stdout.flush()
> sleep(1)
> finally:
> print "Goodbye"
> {code}
> and I run it through Marathon like
> {code:javascript}
> data = {
>   args: ["/tmp/script.py"],
>   instances: 1,
>   cpus: 0.1,
>   mem: 256,
>   id: "marathon-test-api"
> }
> {code}
> During the app restart I get expected result - the task receives sigterm and 
> dies peacefully (during my script-specified 2 seconds period)
> But when i wrap this python script in a docker:
> {code}
> FROM node:4.2
> RUN mkdir /app
> ADD . /app
> WORKDIR /app
> ENTRYPOINT []
> {code}
> and run appropriate application by Marathon:
> {code:javascript}
> data = {
>   args: ["./script.py"],
>   container: {
>   type: "DOCKER",
>   docker: {
>   image: "bydga/marathon-test-api"
>   },
>   forcePullImage: yes
>   },
>   cpus: 0.1,
>   mem: 256,
>   instances: 1,
>   id: "marathon-test-api"
> }
> {code}
> The task during restart (issued from marathon) dies immediately without 
> having a chance to do any cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3072) Unify initialization of modularized components

2016-03-07 Thread Adam B (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3072:
--
Fix Version/s: 0.27.0

> Unify initialization of modularized components
> --
>
> Key: MESOS-3072
> URL: https://issues.apache.org/jira/browse/MESOS-3072
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Affects Versions: 0.22.0, 0.22.1, 0.23.0
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>  Labels: mesosphere
> Fix For: 0.27.0
>
>
> h1.Introduction
> As it stands right now, default implementations of modularized components are 
> required to have a non parametrized {{create()}} static method. This allows 
> to write tests which can cover default implementations and modules based on 
> these default implementations on a uniform way.
> For example, with the interface {{Foo}}:
> {code}
> class Foo {
> public:
>   virtual ~Foo() {}
>   virtual Future hello() = 0;
> protected:
>   Foo() {}
> };
> {code}
> With a default implementation:
> {code}
> class LocalFoo {
> public:
>   Try create() {
> return new Foo;
>   }
>   virtual Future hello() {
> return 1;
>   }
> };
> {code}
> This allows to create typed tests which look as following:
> {code}
> typedef ::testing::Types  tests::Module>
>   FooTestTypes;
> TYPED_TEST_CASE(FooTest, FooTestTypes);
> TYPED_TEST(FooTest, ATest)
> {
>   Try foo = TypeParam::create();
>   ASSERT_SOME(foo);
>   AWAIT_CHECK_EQUAL(foo.get()->hello(), 1);
> }
> {code}
> The test will be applied to each of types in the template parameters of 
> {{FooTestTypes}}. This allows to test different implementation of an 
> interface. In our code, it tests default implementations and a module which 
> uses the same default implementation.
> The class {{tests::Module}} needs a little 
> explanation, it is a wrapper around {{ModuleManager}} which allows the tests 
> to encode information about the requested module in the type itself instead 
> of passing a string to the factory method. The wrapper around create, the 
> real important method looks as follows:
> {code}
> template
> static Try test::Module::create()
> {
>   Try moduleName = getModuleName(N);
>   if (moduleName.isError()) {
> return Error(moduleName.error());
>   }
>   return mesos::modules::ModuleManager::create(moduleName.get());
> }
> {code}
> h1.The Problem
> Consider the following implementation of {{Foo}}:
> {code}
> class ParameterFoo {
> public:
>   Try create(int i) {
> return new ParameterFoo(i);
>   }
>   ParameterFoo(int i) : i_(i) {}
>   virtual Future hello() {
> return i;
>   }
> private:
>   int i_;
> };
> {code}
> As it can be seen, this implementation cannot be used as a default 
> implementation since its create API does not match the one of 
> {{test::Module<>}}: {{create()}} has a different signature for both types. It 
> is still a common situation to require initialization parameters for objects, 
> however this constraint (keeping both interfaces alike) forces default 
> implementations of modularized components to have default constructors, 
> therefore the tests are forcing the design of the interfaces.
> Implementations which are supposed to be used as modules only, i.e. non 
> default implementations are allowed to have constructor parameters, since the 
> actual signature of their factory method is, this factory method's function 
> is to decode the parameters and call the appropriate constructor:
> {code}
> template
> T* Module::create(const Parameters& params);
> {code}
> where parameters is just an array of key-value string pairs whose 
> interpretation is left to the specific module. Sadly, this call is wrapped by 
> {{ModuleManager}} which only allows module parameters to be passed from the 
> command line and does not offer a programmatic way to feed construction 
> parameters to modules.
> h1.The Ugly Workaround
> With the requirement of a default constructor and parameters devoid 
> {{create()}} factory function, a common pattern (see 
> [Authenticator|https://github.com/apache/mesos/blob/9d4ac11ed757aa5869da440dfe5343a61b07199a/include/mesos/authentication/authenticator.hpp])
>  has been introduced to feed construction parameters into default 
> implementation, this leads to adding an {{initialize()}} call to the public 
> interface, which will have {{Foo}} become:
> {code}
> class Foo {
> public:
>   virtual ~Foo() {}
>   virtual Try initialize(Option i) = 0;
>   virtual Future hello() = 0;
> protected:
>   Foo() {}
> };
> {code}
> {{ParameterFoo}} will thus look as follows:
> {code}
> class ParameterFoo {
> public:
>   Try create() {
> return new ParameterFoo;
>   }
>   ParameterFoo() : i_(None()) {}
>   virtual Try

[jira] [Commented] (MESOS-4772) TaskInfo/ExecutorInfo should include owner information

2016-03-07 Thread Adam B (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182781#comment-15182781
 ] 

Adam B commented on MESOS-4772:
---

2b. Mesos authenticates the user accessing Mesos http endpoints, which may or 
may not be the same user accessing the framework's http UI to request that the 
framework launch a task on behalf of the user. Mesos authenticates the 
framework prior to its registration, but has no way (unless the framework tells 
it) to know which user launches a particular task.
4. Only an individual framework can authenticate and authorize users of its own 
UI. Mesos cannot intercept at this point, especially not without the 
framework's assistance. This ticket is about enabling frameworks to provide 
this information to Mesos on task launch, so that Mesos can later make 
authorization decisions based on this information (separate tickets).
5. `FrameworkInfo.user` is not necessarily related to user of the framework's 
UI (or Mesos' UI). It is the linux user which the framework's tasks will run as 
(see `RunTask` ACL) by default, if no `CommandInfo.user` is specified for the 
task/executor. Consider that Alice and Bob may both want to use the Hadoop 
framework to run tasks as the `hadoop` user.


> TaskInfo/ExecutorInfo should include owner information
> --
>
> Key: MESOS-4772
> URL: https://issues.apache.org/jira/browse/MESOS-4772
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Adam B
>Assignee: Jan Schlicht
>  Labels: authorization, mesosphere, ownership, security
>
> We need a way to assign fine-grained ownership to tasks/executors so that 
> multi-user frameworks can tell Mesos to associate the task with a user 
> identity (rather than just the framework principal+role). Then, when an HTTP 
> user requests to view the task's sandbox contents, or kill the task, or list 
> all tasks, the authorizer can determine whether to allow/deny/filter the 
> request based on finer-grained, user-level ownership.
> Some systems may want TaskInfo.owner to represent a group rather than an 
> individual user. That's fine as long as the framework sets the field to the 
> group ID in such a way that a group-aware authorizer can interpret it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-4882) Enabled mesos-execute treat command as executable value and arguments.

2016-03-07 Thread Guangya Liu (JIRA)

Guangya Liu created MESOS-4882:
--

 Summary: Enabled mesos-execute treat command as executable value 
and arguments.
 Key: MESOS-4882
 URL: https://issues.apache.org/jira/browse/MESOS-4882
 Project: Mesos
  Issue Type: Bug
Reporter: Guangya Liu
Assignee: Guangya Liu



The commandInfo support two kind of command:
{code}
// There are two ways to specify the command:
  // 1) If 'shell == true', the command will be launched via shell
  //(i.e., /bin/sh -c 'value'). The 'value' specified will be
  //treated as the shell command. The 'arguments' will be ignored.
  // 2) If 'shell == false', the command will be launched by passing
  //arguments to an executable. The 'value' specified will be
  //treated as the filename of the executable. The 'arguments'
  //will be treated as the arguments to the executable. This is
  //similar to how POSIX exec families launch processes (i.e.,
  //execlp(value, arguments(0), arguments(1), ...)).
{code}

The mesos-execute cannot handle 2) now, enabling 2) can help some unit test 
with isolator.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

57 matches

Mail list logo