[jira] [Assigned] (MESOS-3273) EventCall Test Framework is flaky

2016-02-06 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-3273:
-

Assignee: Anand Mazumdar  (was: Vinod Kone)

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere, tech-debt
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0813 19:55:17.184661 26126 authenticator.cpp:512] 

[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky

2016-02-06 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-3273:
--
Shepherd: Vinod Kone  (was: Anand Mazumdar)

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>  Labels: flaky-test, mesosphere, tech-debt
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 19:55:17.184306 26126 master.cpp:469] Using default 'crammd5' 
> authenticator
> I0813 19:55:17.184661 26126 authenticator.cpp:512] Initializing 

[jira] [Commented] (MESOS-3558) Make the CommandExecutor use the Executor Library speaking HTTP

2016-02-06 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135699#comment-15135699
 ] 

Qian Zhang commented on MESOS-3558:
---

[~anandmazumdar], so we'd like to create a {{HttpCommandExecutor}} which will 
call executor HTTP API and introduce a new agent option (e.g., 
{{--[no-]http_executor}}, default: false to keep backward compatibility), right?

> Make the CommandExecutor use the Executor Library speaking HTTP
> ---
>
> Key: MESOS-3558
> URL: https://issues.apache.org/jira/browse/MESOS-3558
> Project: Mesos
>  Issue Type: Task
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> Instead of using the {{MesosExecutorDriver}} , we should make the 
> {{CommandExecutor}} in {{src/launcher/executor.cpp}} use the new Executor 
> HTTP Library that we create in {{MESOS-3550}}. 
> This would act as a good validation of the {{HTTP API}} implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4614) SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky

2016-02-06 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-4614:
--
Shepherd: Vinod Kone
  Sprint: Mesosphere Sprint 28
Story Points: 3

Patch: https://reviews.apache.org/r/43285/

> SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky
> 
>
> Key: MESOS-4614
> URL: https://issues.apache.org/jira/browse/MESOS-4614
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave, tests
>Affects Versions: 0.27.0
> Environment: CentOS 7, gcc, libevent & SSL enabled
>Reporter: Greg Mann
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> Just saw this failure on the ASF CI:
> {code}
> [ RUN  ] SlaveRecoveryTest/0.CleanupHTTPExecutor
> I0206 00:22:44.791671  2824 leveldb.cpp:174] Opened db in 2.539372ms
> I0206 00:22:44.792459  2824 leveldb.cpp:181] Compacted db in 740473ns
> I0206 00:22:44.792510  2824 leveldb.cpp:196] Created db iterator in 24164ns
> I0206 00:22:44.792532  2824 leveldb.cpp:202] Seeked to beginning of db in 
> 1831ns
> I0206 00:22:44.792548  2824 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 342ns
> I0206 00:22:44.792605  2824 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0206 00:22:44.793256  2847 recover.cpp:447] Starting replica recovery
> I0206 00:22:44.793480  2847 recover.cpp:473] Replica is in EMPTY status
> I0206 00:22:44.794538  2847 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (9472)@172.17.0.2:43484
> I0206 00:22:44.795040  2848 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0206 00:22:44.795644  2848 recover.cpp:564] Updating replica status to 
> STARTING
> I0206 00:22:44.796519  2850 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 752810ns
> I0206 00:22:44.796545  2850 replica.cpp:320] Persisted replica status to 
> STARTING
> I0206 00:22:44.796725  2848 recover.cpp:473] Replica is in STARTING status
> I0206 00:22:44.797828  2857 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (9473)@172.17.0.2:43484
> I0206 00:22:44.798355  2850 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0206 00:22:44.799193  2850 recover.cpp:564] Updating replica status to VOTING
> I0206 00:22:44.799583  2855 master.cpp:376] Master 
> 0b206a40-a9c3-4d44-a5bd-8032d60a32ca (6632562f1ade) started on 
> 172.17.0.2:43484
> I0206 00:22:44.799609  2855 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/n2FxQV/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/n2FxQV/master" --zk_session_timeout="10secs"
> I0206 00:22:44.71  2855 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0206 00:22:44.89  2855 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0206 00:22:44.800020  2855 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/n2FxQV/credentials'
> I0206 00:22:44.800245  2850 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 679345ns
> I0206 00:22:44.800370  2850 replica.cpp:320] Persisted replica status to 
> VOTING
> I0206 00:22:44.800397  2855 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0206 00:22:44.800693  2855 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0206 00:22:44.800815  2855 master.cpp:571] Authorization enabled
> I0206 00:22:44.801216  2850 recover.cpp:578] Successfully joined the Paxos 
> group
> I0206 00:22:44.801604  2850 recover.cpp:462] Recover process terminated
> I0206 00:22:44.801759  2856 whitelist_watcher.cpp:77] No whitelist given
> I0206 00:22:44.801725  2847 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0206 00:22:44.803982  2855 master.cpp:1712] The newly elected leader is 
> master@172.17.0.2:43484 with id 

[jira] [Assigned] (MESOS-4612) Update to Zookeeper 3.4.7

2016-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-4612:
---

Assignee: haosdent

> Update to Zookeeper 3.4.7
> -
>
> Key: MESOS-4612
> URL: https://issues.apache.org/jira/browse/MESOS-4612
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Cody Maloney
>Assignee: haosdent
>  Labels: mesosphere, tech-debt
>
> See: http://zookeeper.apache.org/doc/r3.4.7/releasenotes.html for 
> improvements / bug fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4616) Support specifying a preferred host with a Resource Request

2016-02-06 Thread Jagadish (JIRA)
Jagadish created MESOS-4616:
---

 Summary: Support specifying a preferred host with a Resource 
Request
 Key: MESOS-4616
 URL: https://issues.apache.org/jira/browse/MESOS-4616
 Project: Mesos
  Issue Type: Story
Reporter: Jagadish


When stateful services like Apache Samza, Kafka must be restarted with using 
Mesos, the framework must have a way of specifying a preferred host with the 
request. 

More background:
I work on Apache Samza , a distributed stream processing framework. Currently 
Samza supports only Yarn as a resource manager. (there have been requests to 
run Samza with mesos). A cluster (200 nodes 'ish) runs many Samza Jobs (about 
3500). Each Samza Job has its own framework that requests resources 
(containers) for the job to run. Each such container uses GBs of local state  . 
When such a container(resource) is started on a different host by the 
framework, the local state must be re-bootstrapped.  (this results in a long 
bootstrap time, which is essentially down time).

The same is true for Apache Kafka, a distributed pub-sub logging system.  When 
a Kafka broker must be restarted by the framework, it should ideally be 
re-started on the same host. (otherwise, each broker has to re-bootstrap 
several GBs of logs from its peers before it can start to service a request.)

I'm sure many stateful services have similar requirements.

My framework can update/ filter resource offers, but with many frameworks 
(thousands) on several nodes, I'm concerned about the wait-time for a 
particular available resource to be offered to a framework that needs it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4617) CMake failed because APPC_SPEC missing

2016-02-06 Thread haosdent (JIRA)
haosdent created MESOS-4617:
---

 Summary: CMake failed because APPC_SPEC missing
 Key: MESOS-4617
 URL: https://issues.apache.org/jira/browse/MESOS-4617
 Project: Mesos
  Issue Type: Bug
Reporter: haosdent
Assignee: haosdent


Because appc spec proto missing in CMakefile, use cmake to make mesos would 
failed.
{code}
/Users/haosdent/workspace/cpp/mesos/include/mesos/appc/spec.hpp:25:10: fatal 
error: 'mesos/appc/spec.pb.h' file not found
#include 
 ^
1 error generated.
make[2]: *** 
[src/CMakeFiles/mesos-0.28.0.dir/slave/containerizer/mesos/provisioner/appc/paths.cpp.o]
 Error 1
make[2]: *** Waiting for unfinished jobs
In file included from 
/Users/haosdent/workspace/cpp/mesos/src/slave/containerizer/mesos/provisioner/appc/cache.cpp:19:
/Users/haosdent/workspace/cpp/mesos/include/mesos/appc/spec.hpp:25:10: fatal 
error: 'mesos/appc/spec.pb.h' file not found
#include 
 ^
1 error generated.
make[2]: *** 
[src/CMakeFiles/mesos-0.28.0.dir/slave/containerizer/mesos/provisioner/appc/cache.cpp.o]
 Error 1
In file included from 
/Users/haosdent/workspace/cpp/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:29:
/Users/haosdent/workspace/cpp/mesos/include/mesos/appc/spec.hpp:25:10: fatal 
error: 'mesos/appc/spec.pb.h' file not found
#include 
 ^
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4614) SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky

2016-02-06 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-4614:
-

Assignee: Anand Mazumdar

> SlaveRecoveryTest/0.CleanupHTTPExecutor is flaky
> 
>
> Key: MESOS-4614
> URL: https://issues.apache.org/jira/browse/MESOS-4614
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, slave, tests
>Affects Versions: 0.27.0
> Environment: CentOS 7, gcc, libevent & SSL enabled
>Reporter: Greg Mann
>Assignee: Anand Mazumdar
>  Labels: flaky-test, mesosphere
>
> Just saw this failure on the ASF CI:
> {code}
> [ RUN  ] SlaveRecoveryTest/0.CleanupHTTPExecutor
> I0206 00:22:44.791671  2824 leveldb.cpp:174] Opened db in 2.539372ms
> I0206 00:22:44.792459  2824 leveldb.cpp:181] Compacted db in 740473ns
> I0206 00:22:44.792510  2824 leveldb.cpp:196] Created db iterator in 24164ns
> I0206 00:22:44.792532  2824 leveldb.cpp:202] Seeked to beginning of db in 
> 1831ns
> I0206 00:22:44.792548  2824 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 342ns
> I0206 00:22:44.792605  2824 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0206 00:22:44.793256  2847 recover.cpp:447] Starting replica recovery
> I0206 00:22:44.793480  2847 recover.cpp:473] Replica is in EMPTY status
> I0206 00:22:44.794538  2847 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (9472)@172.17.0.2:43484
> I0206 00:22:44.795040  2848 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0206 00:22:44.795644  2848 recover.cpp:564] Updating replica status to 
> STARTING
> I0206 00:22:44.796519  2850 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 752810ns
> I0206 00:22:44.796545  2850 replica.cpp:320] Persisted replica status to 
> STARTING
> I0206 00:22:44.796725  2848 recover.cpp:473] Replica is in STARTING status
> I0206 00:22:44.797828  2857 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (9473)@172.17.0.2:43484
> I0206 00:22:44.798355  2850 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0206 00:22:44.799193  2850 recover.cpp:564] Updating replica status to VOTING
> I0206 00:22:44.799583  2855 master.cpp:376] Master 
> 0b206a40-a9c3-4d44-a5bd-8032d60a32ca (6632562f1ade) started on 
> 172.17.0.2:43484
> I0206 00:22:44.799609  2855 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/n2FxQV/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/n2FxQV/master" --zk_session_timeout="10secs"
> I0206 00:22:44.71  2855 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0206 00:22:44.89  2855 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0206 00:22:44.800020  2855 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/n2FxQV/credentials'
> I0206 00:22:44.800245  2850 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 679345ns
> I0206 00:22:44.800370  2850 replica.cpp:320] Persisted replica status to 
> VOTING
> I0206 00:22:44.800397  2855 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0206 00:22:44.800693  2855 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0206 00:22:44.800815  2855 master.cpp:571] Authorization enabled
> I0206 00:22:44.801216  2850 recover.cpp:578] Successfully joined the Paxos 
> group
> I0206 00:22:44.801604  2850 recover.cpp:462] Recover process terminated
> I0206 00:22:44.801759  2856 whitelist_watcher.cpp:77] No whitelist given
> I0206 00:22:44.801725  2847 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0206 00:22:44.803982  2855 master.cpp:1712] The newly elected leader is 
> master@172.17.0.2:43484 with id 0b206a40-a9c3-4d44-a5bd-8032d60a32ca
> I0206 00:22:44.804026  2855 master.cpp:1725] Elected as the 

[jira] [Commented] (MESOS-4039) PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails

2016-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135878#comment-15135878
 ] 

haosdent commented on MESOS-4039:
-

Sorry to forgot post it.
{code}
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from PerfEventIsolatorTest
[ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
I0207 00:58:32.392724 16501 perf_event.cpp:71] Creating PerfEvent isolator
I0207 00:58:32.440187 16501 perf_event.cpp:109] PerfEvent isolator will profile 
for 250ms every 500ms for events: { cycles, task-clock }
I0207 00:58:32.443006 16521 perf_event.cpp:217] Preparing perf event cgroup for 
239d30bb-f7a1-413b-9d99-0914149d5899
E0207 00:58:33.224544 16518 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
E0207 00:58:33.727793 16516 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
E0207 00:58:34.230981 16517 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
E0207 00:58:34.734318 16520 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
E0207 00:58:35.237889 16517 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
E0207 00:58:35.742452 16522 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
E0207 00:58:36.246068 16515 perf_event.cpp:408] Failed to get perf sample: 
Failed to parse perf sample: Failed to parse perf sample line ',,cycles,mesos/239d30bb-f7a1-413b-9d99-0914149d5899': Unexpected number 
of fields
../../src/tests/containerizer/isolator_tests.cpp:1083: Failure
Expected: (statistics1.get().perf().timestamp()) != 
(statistics2.perf().timestamp()), actual: 1.45478e+09 vs 1.45478e+09
../../src/tests/containerizer/isolator_tests.cpp:1085: Failure
Value of: statistics2.perf().has_cycles()
  Actual: false
Expected: true
../../src/tests/containerizer/isolator_tests.cpp:1088: Failure
Value of: statistics2.perf().has_task_clock()
  Actual: false
Expected: true
[  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (4069 ms)
[--] 1 test from PerfEventIsolatorTest (4069 ms total)

[--] Global test environment tear-down
../../src/tests/environment.cpp:732: Failure
Failed
Tests completed with child processes remaining:
-+- 16501 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests 
--gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose
 |-+- 16580 /home/haosdent/mesos/build/src/.libs/lt-mesos-tests 
--gtest_filter=PerfEventIsolatorTest.ROOT_CGROUPS_Sample --verbose
 | \-+- 16582 perf stat --all-cpus --field-separator , --log-fd 1 --event 
cycles --cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 --event task-clock 
--cgroup mesos/239d30bb-f7a1-413b-9d99-0914149d5899 -- sleep 0.25
 |   \--- 16584 sleep 0.25
 \--- 16581 ()
[==] 1 test from 1 test case ran. (4095 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
{code}

I use
{code}
sudo GLOG_v=1 ./bin/mesos-tests.sh 
--gtest_filter="PerfEventIsolatorTest.ROOT_CGROUPS_Sample" --verbose
{code}
to test. As you see, have two problems. One is could not handle the perf event 
format in CentOS 7.1(Kernel 3.10.0), the other one is didn't wait for processes 
exit.

> PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails
> ---
>
> Key: MESOS-4039
> URL: https://issues.apache.org/jira/browse/MESOS-4039
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Jan Schlicht
>  Labels: mesosphere, test-fail
>
> PerfEventIsolatorTest.ROOT_CGROUPS_Sample fails on CentOS 6.6:
> {code}
> [--] 1 test from PerfEventIsolatorTest
> [ RUN  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample
> ../../src/tests/containerizer/isolator_tests.cpp:848: Failure
> isolator: Perf is not supported
> [  FAILED  ] PerfEventIsolatorTest.ROOT_CGROUPS_Sample (79 ms)
> [--] 1 test from PerfEventIsolatorTest (79 ms total)
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (86 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  

[jira] [Commented] (MESOS-4611) Passing a lambda to dispatch() always matches the template returning void

2016-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135887#comment-15135887
 ] 

haosdent commented on MESOS-4611:
-

I think we could use
{code}
Future initialized = dispatch(pid, [] () -> Future {
return Nothing();
  });
{code}?

> Passing a lambda to dispatch() always matches the template returning void
> -
>
> Key: MESOS-4611
> URL: https://issues.apache.org/jira/browse/MESOS-4611
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Kevin Klues
>  Labels: dispatch, libprocess, mesosphere
>
> The following idiom does not currently compile:
> {code}
>   Future initialized = dispatch(pid, [] () -> Nothing {
> return Nothing();
>   });
> {code}
> This seems non-intuitive because the following template exists for dispatch:
> {code}
> template 
> Future dispatch(const UPID& pid, const std::function& f)
> {
>   std::shared_ptr promise(new Promise()); 
>  
>   std::shared_ptr> f_(
>   new std::function(
>   [=](ProcessBase*) {
> promise->set(f());
>   }));
>   internal::dispatch(pid, f_);
>   
>   return promise->future();
> } 
> {code}
> However, lambdas cannot be implicitly cast to a corresponding 
> std::function type.
> To make this work, you have to explicitly type the lambda before passing it 
> to dispatch.
> {code}
>   std::function f = []() { return Nothing(); };
>   Future initialized = dispatch(pid, f);
> {code}
> We should add template support to allow lambdas to be passed to dispatch() 
> without explicit typing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135906#comment-15135906
 ] 

haosdent commented on MESOS-1802:
-

Yes, [~neilc] do you have the stdout and stderr log?

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>  Labels: flaky, mesosphere
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0916 22:56:14.039836 21047 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 144215ns
> I0916 22:56:14.039850 21047 replica.cpp:676] Persisted action at 0
> I0916 22:56:14.040243 21047 replica.cpp:508] Replica received write request 
> for position 0
> I0916 22:56:14.040267 21047 

[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

2016-02-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136165#comment-15136165
 ] 

Deshi Xiao commented on MESOS-1806:
---

go through the etcd v3's overview, it very amazon on the feature. I think we 
need think in deep, how about replace the zookeeper with etcd. it elegant 
feature more than zookeeper embed feature.

> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4616) Support specifying a preferred host with a Resource Request

2016-02-06 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136111#comment-15136111
 ] 

Guangya Liu commented on MESOS-4616:


[~jagadish1...@gmail.com] There is a persist volume here 
https://github.com/apache/mesos/blob/master/src/examples/persistent_volume_framework.cpp
 , you can take a look if this helps.

> Support specifying a preferred host with a Resource Request
> ---
>
> Key: MESOS-4616
> URL: https://issues.apache.org/jira/browse/MESOS-4616
> Project: Mesos
>  Issue Type: Story
>Reporter: Jagadish
>
> When stateful services like Apache Samza, Kafka must be restarted with using 
> Mesos, the framework must have a way of specifying a preferred host with the 
> request. 
> More background:
> I work on Apache Samza , a distributed stream processing framework. Currently 
> Samza supports only Yarn as a resource manager. (there have been requests to 
> run Samza with mesos). A cluster (200 nodes 'ish) runs many Samza Jobs (about 
> 3500). Each Samza Job has its own framework that requests resources 
> (containers) for the job to run. Each such container uses GBs of local state  
> . When such a container(resource) is started on a different host by the 
> framework, the local state must be re-bootstrapped.  (this results in a long 
> bootstrap time, which is essentially down time).
> The same is true for Apache Kafka, a distributed pub-sub logging system.  
> When a Kafka broker must be restarted by the framework, it should ideally be 
> re-started on the same host. (otherwise, each broker has to re-bootstrap 
> several GBs of logs from its peers before it can start to service a request.)
> I'm sure many stateful services have similar requirements.
> My framework can update/ filter resource offers, but with many frameworks 
> (thousands) on several nodes, I'm concerned about the wait-time for a 
> particular available resource to be offered to a framework that needs it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules

2016-02-06 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136149#comment-15136149
 ] 

Klaus Ma edited comment on MESOS-4610 at 2/7/16 5:22 AM:
-

[~mcavage], this proposal makes sense to me :). Would you help to summary the 
implementation detail before RR? The code diff is really huge.

BTW, I think you can drop an email to dev@ to find a shepherd for this JIRA.


was (Author: klaus1982):
[~mcavage], this proposal makes sense to me :). Would you help to summary the 
implementation detail before RR? The code diff is really huge.

> MasterContender/MasterDetector should be loadable as modules
> 
>
> Key: MESOS-4610
> URL: https://issues.apache.org/jira/browse/MESOS-4610
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Mark Cavage
>
> Currently mesos depends on Zookeeper for leader election and notification to 
> slaves, although there is a C++ hierarchy in the code to support alternatives 
> (e.g., unit tests use an in-memory implementation). From an operational 
> perspective, many organizations/users do not want to take a dependency on 
> Zookeeper, and use an alternative solution to implementing leader election. 
> Our organization in particular, very much wants this, and as a reference 
> there have been several requests from the community (see referenced tickets) 
> to replace with etcd/consul/etc.
> This ticket will serve as the work effort to modularize the 
> MasterContender/MasterDetector APIs such that integrators can build a 
> pluggable solution of their choice; this ticket will not fold in any 
> implementations such as etcd et al., but simply move this hierarchy to be 
> fully pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4610) MasterContender/MasterDetector should be loadable as modules

2016-02-06 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136149#comment-15136149
 ] 

Klaus Ma commented on MESOS-4610:
-

[~mcavage], this proposal makes sense to me :). Would you help to summary the 
implementation detail before RR? The code diff is really huge.

> MasterContender/MasterDetector should be loadable as modules
> 
>
> Key: MESOS-4610
> URL: https://issues.apache.org/jira/browse/MESOS-4610
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Mark Cavage
>
> Currently mesos depends on Zookeeper for leader election and notification to 
> slaves, although there is a C++ hierarchy in the code to support alternatives 
> (e.g., unit tests use an in-memory implementation). From an operational 
> perspective, many organizations/users do not want to take a dependency on 
> Zookeeper, and use an alternative solution to implementing leader election. 
> Our organization in particular, very much wants this, and as a reference 
> there have been several requests from the community (see referenced tickets) 
> to replace with etcd/consul/etc.
> This ticket will serve as the work effort to modularize the 
> MasterContender/MasterDetector APIs such that integrators can build a 
> pluggable solution of their choice; this ticket will not fold in any 
> implementations such as etcd et al., but simply move this hierarchy to be 
> fully pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4572) docker executor should support preconfiguration and postconfiguration steps

2016-02-06 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136096#comment-15136096
 ] 

Klaus Ma commented on MESOS-4572:
-

It dependent on executor implementation: {{TaskInfo::data}} can include binary 
data, so framework implementation can pass preconfig/postconfig info to 
executor and handle it.

> docker executor should support preconfiguration and postconfiguration steps
> ---
>
> Key: MESOS-4572
> URL: https://issues.apache.org/jira/browse/MESOS-4572
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: popsuper1982
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> there should be one way to execute some scripts before and after the docker 
> is created.
> Preconfiguration: for example, check the environments (IP addresses, 
> Hostnames, Bridges, Volume directories) of the hosts where docker will run, 
> and form the ENV for the docker and then run the docker with those 
> environments.
> Postconfiguration: for example, use pipework to configure the docker, get the 
> ports allocated and reports to another docker to configure relationships 
> between applications between dockers.
> another postconfiguration example, after docker is shutdown, cleanup some 
> data in the volumes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

2016-02-06 Thread Deshi Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136164#comment-15136164
 ] 

Deshi Xiao commented on MESOS-1806:
---

go through the etcd v3's overview, it very amazon on the feature. I think we 
need think in deep, how about replace the zookeeper with etcd. it elegant 
feature more than zookeeper embed feature.

> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-1806) Substituting etcd for Zookeeper

2016-02-06 Thread Deshi Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deshi Xiao updated MESOS-1806:
--
Comment: was deleted

(was: go through the etcd v3's overview, it very amazon on the feature. I think 
we need think in deep, how about replace the zookeeper with etcd. it elegant 
feature more than zookeeper embed feature.)

> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4616) Support specifying a preferred host with a Resource Request

2016-02-06 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136085#comment-15136085
 ] 

Klaus Ma commented on MESOS-4616:
-

This is the same requirement of Dynamic Reservation [1] & Persistent Volume 
[2], would you try those two features?

[1] http://mesos.apache.org/documentation/latest/reservation/
[2] http://mesos.apache.org/documentation/latest/persistent-volume/

> Support specifying a preferred host with a Resource Request
> ---
>
> Key: MESOS-4616
> URL: https://issues.apache.org/jira/browse/MESOS-4616
> Project: Mesos
>  Issue Type: Story
>Reporter: Jagadish
>
> When stateful services like Apache Samza, Kafka must be restarted with using 
> Mesos, the framework must have a way of specifying a preferred host with the 
> request. 
> More background:
> I work on Apache Samza , a distributed stream processing framework. Currently 
> Samza supports only Yarn as a resource manager. (there have been requests to 
> run Samza with mesos). A cluster (200 nodes 'ish) runs many Samza Jobs (about 
> 3500). Each Samza Job has its own framework that requests resources 
> (containers) for the job to run. Each such container uses GBs of local state  
> . When such a container(resource) is started on a different host by the 
> framework, the local state must be re-bootstrapped.  (this results in a long 
> bootstrap time, which is essentially down time).
> The same is true for Apache Kafka, a distributed pub-sub logging system.  
> When a Kafka broker must be restarted by the framework, it should ideally be 
> re-started on the same host. (otherwise, each broker has to re-bootstrap 
> several GBs of logs from its peers before it can start to service a request.)
> I'm sure many stateful services have similar requirements.
> My framework can update/ filter resource offers, but with many frameworks 
> (thousands) on several nodes, I'm concerned about the wait-time for a 
> particular available resource to be offered to a framework that needs it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4582) state.json serving duplicate "active" fields

2016-02-06 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136137#comment-15136137
 ] 

Marco Massenzio commented on MESOS-4582:


Sounds good to me!
I'd suggest to document the behavior someplace with a reference to the 
appropriate standard document, so that people won't make the same mistake I did.

> state.json serving duplicate "active" fields
> 
>
> Key: MESOS-4582
> URL: https://issues.apache.org/jira/browse/MESOS-4582
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Michael Gummelt
>Assignee: Michael Park
>Priority: Blocker
> Attachments: error.json
>
>
> state.json is serving duplicate "active" fields in frameworks.  See the 
> framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4613) Mesos when used with --log_dir generates hundreds of thousands of log files per day

2016-02-06 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136148#comment-15136148
 ] 

Klaus Ma commented on MESOS-4613:
-

It makes sense to improve fetcher's log; one option is to let fetcher only log 
error message to {{stderr}} and let slave log it into slave's log as fetcher's 
logic is simple and the error message should be enough for us.

> Mesos when used with --log_dir generates hundreds of thousands of log files 
> per day
> ---
>
> Key: MESOS-4613
> URL: https://issues.apache.org/jira/browse/MESOS-4613
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Lukas Loesche
>
> We're using mesos with --log_dir=/var/log/mesos
> Lately in addition to the mesos-master and mesos-slave log there's also been 
> mesos-fetcher logs written into this directory.
> It seems that every process generates a new log file with a unique file name 
> containing the date and pid. For mesos-master and mesos-slave this makes 
> sense. For mesos-fetcher not so much.
> On a moderately busy agent it's currently generating 200k log files per day. 
> On our particular system this would cause logrotate to segfault. And standard 
> tools like 'rm mesos-fetcher*' won't work because there's too many files to 
> expand the command.
> I also noted that a lot of the created files are zero bytes. So for now we're 
> running a cron every minute
> {noformat}
> find /var/log/mesos -size 0 -name 'mesos-fetcher*' -delete
> {noformat}
> as a workaround.
> Anyways it would be nice if there was an option to make the mesos-fetcher 
> write into a single log file instead of creating thousands of individual 
> files.
> Or if that's easier to implement an option to only write the master and slave 
> log but not the fetcher logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4572) docker executor should support preconfiguration and postconfiguration steps

2016-02-06 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136099#comment-15136099
 ] 

Klaus Ma commented on MESOS-4572:
-

Currently, Marathon use DockerExecutor/CommandExecutor provided by Mesos to 
start service. It's better to let Marathon build its owner executors for such 
kind of requirement.

> docker executor should support preconfiguration and postconfiguration steps
> ---
>
> Key: MESOS-4572
> URL: https://issues.apache.org/jira/browse/MESOS-4572
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: popsuper1982
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> there should be one way to execute some scripts before and after the docker 
> is created.
> Preconfiguration: for example, check the environments (IP addresses, 
> Hostnames, Bridges, Volume directories) of the hosts where docker will run, 
> and form the ENV for the docker and then run the docker with those 
> environments.
> Postconfiguration: for example, use pipework to configure the docker, get the 
> ports allocated and reports to another docker to configure relationships 
> between applications between dockers.
> another postconfiguration example, after docker is shutdown, cleanup some 
> data in the volumes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4591) `/reserve` endpoint allows reservations for any role

2016-02-06 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135713#comment-15135713
 ] 

Guangya Liu commented on MESOS-4591:


[~greggomann] what is the problem if /reserve endpoint allows reservations for 
any role? I think that the /create endpoint also allows to create persistent 
volume for any role.

> `/reserve` endpoint allows reservations for any role
> 
>
> Key: MESOS-4591
> URL: https://issues.apache.org/jira/browse/MESOS-4591
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Greg Mann
>  Labels: mesosphere, reservations
>
> When frameworks reserve resources, the validation of the operation ensures 
> that the {{role}} of the reservation matches the {{role}} of the framework. 
> For the case of the {{/reserve}} operator endpoint, however, the operator has 
> no role to validate, so this check isn't performed.
> This means that if an ACL exists which authorizes a framework's principal to 
> reserve resources, that same principal can be used to reserve resources for 
> _any_ role through the operator endpoint.
> We should restrict reservations made through the operator endpoint to 
> specified roles. A few possibilities:
> * The {{object}} of the {{reserve_resources}} ACL could be changed from 
> {{resources}} to {{roles}}
> * A second ACL could be added for authorization of {{reserve}} operations, 
> with an {{object}} of {{role}}
> * Our conception of the {{resources}} object in the {{reserve_resources}} ACL 
> could be expanded to include role information, i.e., 
> {{disk(role1);mem(role1)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4572) docker executor should support preconfiguration and postconfiguration steps

2016-02-06 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135766#comment-15135766
 ] 

Guangya Liu commented on MESOS-4572:


[~LiuChao] does your private implementation updated the code of Marathon? How 
does the LINKER_CONF_PARAMS works via env?

In my understanding, I think that this work should be done by framework but not 
mesos itself, what do you say? Thanks.

> docker executor should support preconfiguration and postconfiguration steps
> ---
>
> Key: MESOS-4572
> URL: https://issues.apache.org/jira/browse/MESOS-4572
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: popsuper1982
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> there should be one way to execute some scripts before and after the docker 
> is created.
> Preconfiguration: for example, check the environments (IP addresses, 
> Hostnames, Bridges, Volume directories) of the hosts where docker will run, 
> and form the ENV for the docker and then run the docker with those 
> environments.
> Postconfiguration: for example, use pipework to configure the docker, get the 
> ports allocated and reports to another docker to configure relationships 
> between applications between dockers.
> another postconfiguration example, after docker is shutdown, cleanup some 
> data in the volumes 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)