[jira] [Created] (MESOS-8679) If the first taskkill stuck in the default executor, all other killtasks will be ignored.

2018-03-15 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-8679:
---

 Summary: If the first taskkill stuck in the default executor, all 
other killtasks will be ignored.
 Key: MESOS-8679
 URL: https://issues.apache.org/jira/browse/MESOS-8679
 Project: Mesos
  Issue Type: Bug
  Components: executor
Reporter: Gilbert Song


If the first taskkill stuck in the default executor, all other killtasks will 
be ignored. It would make a particular task become unkillable (task_killing) 
forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8651) Potential memory leaks in the `volume/sandbox_path` isolator

2018-03-15 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401204#comment-16401204
 ] 

Jason Lai commented on MESOS-8651:
--

[~gilbert]: We have a patch available at https://reviews.apache.org/r/66104/. 
Thanks for the code review!
cc [~zhitao], [~jieyu]

> Potential memory leaks in the `volume/sandbox_path` isolator
> 
>
> Key: MESOS-8651
> URL: https://issues.apache.org/jira/browse/MESOS-8651
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jason Lai
>Assignee: Jason Lai
>Priority: Major
>  Labels: easyfix, patch
>
> The {{sandboxes}} hashmap object of 
> {{mesos::internal::slave::VolumeSandboxPathIsolatorProcess}} bears the risk 
> of potential memory leak.
> It [adds the sandbox path upon each container 
> launch|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/isolators/volume/sandbox_path.cpp#L119-L122]
>  and does not remove the sandbox path after cleaning up the container. As the 
> life cycle of an isolator is attached to that of {{MesosContainerizer}}, this 
> means that more and more sandbox paths will get added to the {{sandboxes}} 
> hashmap object, as Mesos containers keep being launched and will likely blow 
> up Mesos agent eventually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8678) Bump gRPC bundle to 1.10.0

2018-03-15 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-8678:
--

 Summary: Bump gRPC bundle to 1.10.0
 Key: MESOS-8678
 URL: https://issues.apache.org/jira/browse/MESOS-8678
 Project: Mesos
  Issue Type: Task
Reporter: Chun-Hung Hsiao
Assignee: Chun-Hung Hsiao


Bump gRPC to 1.10.0 for better CMake support and new header paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-2976) Allow create multi master in test cases

2018-03-15 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-2976:
--

Assignee: (was: haosdent)

> Allow create multi master in test cases
> ---
>
> Key: MESOS-2976
> URL: https://issues.apache.org/jira/browse/MESOS-2976
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, master, test
>Reporter: haosdent
>Priority: Minor
>
> Master use "master" as the fixed pid.id. This make it impossible to start 
> multi masters in a same process at the same time. Also, current libprocess 
> only allow bind to one port in one process. In some test scenarios, we need 
> start multi masters at the same time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8677) FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS

2018-03-15 Thread Alexander Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas reassigned MESOS-8677:
--

Assignee: Alexander Rojas

> FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS
> 
>
> Key: MESOS-8677
> URL: https://issues.apache.org/jira/browse/MESOS-8677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: macOS 10.13.3 with LLVM 6.0.0 as well as with Apple LLVM 
> version 9.0.0 (clang-900.0.39.2)
>Reporter: Jan Schlicht
>Assignee: Alexander Rojas
>Priority: Major
>
> Here's a {{GLOG_v=1}} run of the test:
> {noformat}
> [ RUN  ] FaultToleranceTest.ReregisterCompletedFrameworks
> I0314 14:30:11.240077 2290090816 cluster.cpp:172] Creating default 'local' 
> authorizer
> I0314 14:30:11.241261 55140352 master.cpp:463] Master 
> 025f775d-9c75-43f6-9ee6-079a605fbf01 (Jenkinss-Mac-mini.local) started on 
> 10.0.49.4:54648
> I0314 14:30:11.241287 55140352 master.cpp:465] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials"
>  --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --require_agent_domain="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/master"
>  --zk_session_timeout="10secs"
> I0314 14:30:11.241439 55140352 master.cpp:514] Master only allowing 
> authenticated frameworks to register
> I0314 14:30:11.241447 55140352 master.cpp:520] Master only allowing 
> authenticated agents to register
> I0314 14:30:11.241452 55140352 master.cpp:526] Master only allowing 
> authenticated HTTP frameworks to register
> I0314 14:30:11.241461 55140352 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials'
> I0314 14:30:11.241678 55140352 master.cpp:570] Using default 'crammd5' 
> authenticator
> I0314 14:30:11.241739 55140352 http.cpp:957] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0314 14:30:11.241824 55140352 http.cpp:957] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0314 14:30:11.241873 55140352 http.cpp:957] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0314 14:30:11.241919 55140352 master.cpp:649] Authorization enabled
> I0314 14:30:11.242066 52457472 whitelist_watcher.cpp:77] No whitelist given
> I0314 14:30:11.242079 51920896 hierarchical.cpp:175] Initialized hierarchical 
> allocator process
> I0314 14:30:11.243557 52994048 master.cpp:2119] Elected as the leading master!
> I0314 14:30:11.243574 52994048 master.cpp:1678] Recovering from registrar
> I0314 14:30:11.243640 51920896 registrar.cpp:347] Recovering registrar
> I0314 14:30:11.243852 52457472 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 190976ns
> I0314 14:30:11.243928 52457472 registrar.cpp:495] Applied 1 operations in 
> 28606ns; attempting to update the registry
> I0314 14:30:11.244163 52457472 registrar.cpp:552] Successfully updated the 
> registry in 194816ns
> I0314 14:30:11.244222 52457472 registrar.cpp:424] Successfully recovered 
> registrar
> I0314 14:30:11.244408 54067200 master.cpp:1792] Recovered 0 agents from the 
> registry (155B); allowing 10mins for agents to reregister
> I0314 14:30:11.23 52994048 hierarchical.cpp:213] Skipping recovery of 
> hierarchical allocator: nothing to recover
> W0314 14:30:11.247259 2290090816 process.cpp:2805] Attempted to spawn already 
> running process files@10.0.49.4:54648
> I0314 14:30:11.247681 2290090816 cluster.c

[jira] [Commented] (MESOS-8677) FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS

2018-03-15 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400420#comment-16400420
 ] 

Alexander Rojas commented on MESOS-8677:


The issue seem to be using 
[{{hashset}}|https://github.com/apache/mesos/blob/master/src/common/http.cpp#L871]
 to ensure only one entry per action while constructing the 
{{ObjectApprovers}}. We were relying that different iterations would show the 
{{hashset}} contents in the same order, but this was not true in macOS (nor can 
we ever rely on that), particularly after the {{hashset}} was copied.

Replacing the {{hashset}} for a {{set}} which does have a strong order 
guarantee, solves the issue.

> FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS
> 
>
> Key: MESOS-8677
> URL: https://issues.apache.org/jira/browse/MESOS-8677
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: macOS 10.13.3 with LLVM 6.0.0 as well as with Apple LLVM 
> version 9.0.0 (clang-900.0.39.2)
>Reporter: Jan Schlicht
>Priority: Major
>
> Here's a {{GLOG_v=1}} run of the test:
> {noformat}
> [ RUN  ] FaultToleranceTest.ReregisterCompletedFrameworks
> I0314 14:30:11.240077 2290090816 cluster.cpp:172] Creating default 'local' 
> authorizer
> I0314 14:30:11.241261 55140352 master.cpp:463] Master 
> 025f775d-9c75-43f6-9ee6-079a605fbf01 (Jenkinss-Mac-mini.local) started on 
> 10.0.49.4:54648
> I0314 14:30:11.241287 55140352 master.cpp:465] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials"
>  --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --require_agent_domain="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/master"
>  --zk_session_timeout="10secs"
> I0314 14:30:11.241439 55140352 master.cpp:514] Master only allowing 
> authenticated frameworks to register
> I0314 14:30:11.241447 55140352 master.cpp:520] Master only allowing 
> authenticated agents to register
> I0314 14:30:11.241452 55140352 master.cpp:526] Master only allowing 
> authenticated HTTP frameworks to register
> I0314 14:30:11.241461 55140352 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials'
> I0314 14:30:11.241678 55140352 master.cpp:570] Using default 'crammd5' 
> authenticator
> I0314 14:30:11.241739 55140352 http.cpp:957] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0314 14:30:11.241824 55140352 http.cpp:957] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0314 14:30:11.241873 55140352 http.cpp:957] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0314 14:30:11.241919 55140352 master.cpp:649] Authorization enabled
> I0314 14:30:11.242066 52457472 whitelist_watcher.cpp:77] No whitelist given
> I0314 14:30:11.242079 51920896 hierarchical.cpp:175] Initialized hierarchical 
> allocator process
> I0314 14:30:11.243557 52994048 master.cpp:2119] Elected as the leading master!
> I0314 14:30:11.243574 52994048 master.cpp:1678] Recovering from registrar
> I0314 14:30:11.243640 51920896 registrar.cpp:347] Recovering registrar
> I0314 14:30:11.243852 52457472 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 190976ns
> I0314 14:30:11.243928 52457472 registrar.cpp:495] Applied 1 operations in 
> 28606ns; attempting to update the registry
> I0314 14:30:11.244163 52457472 registrar.cpp:552] Successfully updated the 
> registry in 194816ns
> I0314 14:30:11.244222 5245

[jira] [Created] (MESOS-8677) FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS

2018-03-15 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8677:
---

 Summary: FaulToleranceTest.ReregisterCompletedFrameworks crashes 
on macOS
 Key: MESOS-8677
 URL: https://issues.apache.org/jira/browse/MESOS-8677
 Project: Mesos
  Issue Type: Bug
  Components: test
 Environment: macOS 10.13.3 with LLVM 6.0.0 as well as with Apple LLVM 
version 9.0.0 (clang-900.0.39.2)
Reporter: Jan Schlicht


Here's a {{GLOG_v=1}} run of the test:
{noformat}
[ RUN  ] FaultToleranceTest.ReregisterCompletedFrameworks
I0314 14:30:11.240077 2290090816 cluster.cpp:172] Creating default 'local' 
authorizer
I0314 14:30:11.241261 55140352 master.cpp:463] Master 
025f775d-9c75-43f6-9ee6-079a605fbf01 (Jenkinss-Mac-mini.local) started on 
10.0.49.4:54648
I0314 14:30:11.241287 55140352 master.cpp:465] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" 
--credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials"
 --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--require_agent_domain="false" --root_submissions="true" --user_sorter="drf" 
--version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/master"
 --zk_session_timeout="10secs"
I0314 14:30:11.241439 55140352 master.cpp:514] Master only allowing 
authenticated frameworks to register
I0314 14:30:11.241447 55140352 master.cpp:520] Master only allowing 
authenticated agents to register
I0314 14:30:11.241452 55140352 master.cpp:526] Master only allowing 
authenticated HTTP frameworks to register
I0314 14:30:11.241461 55140352 credentials.hpp:37] Loading credentials for 
authentication from 
'/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials'
I0314 14:30:11.241678 55140352 master.cpp:570] Using default 'crammd5' 
authenticator
I0314 14:30:11.241739 55140352 http.cpp:957] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0314 14:30:11.241824 55140352 http.cpp:957] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0314 14:30:11.241873 55140352 http.cpp:957] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0314 14:30:11.241919 55140352 master.cpp:649] Authorization enabled
I0314 14:30:11.242066 52457472 whitelist_watcher.cpp:77] No whitelist given
I0314 14:30:11.242079 51920896 hierarchical.cpp:175] Initialized hierarchical 
allocator process
I0314 14:30:11.243557 52994048 master.cpp:2119] Elected as the leading master!
I0314 14:30:11.243574 52994048 master.cpp:1678] Recovering from registrar
I0314 14:30:11.243640 51920896 registrar.cpp:347] Recovering registrar
I0314 14:30:11.243852 52457472 registrar.cpp:391] Successfully fetched the 
registry (0B) in 190976ns
I0314 14:30:11.243928 52457472 registrar.cpp:495] Applied 1 operations in 
28606ns; attempting to update the registry
I0314 14:30:11.244163 52457472 registrar.cpp:552] Successfully updated the 
registry in 194816ns
I0314 14:30:11.244222 52457472 registrar.cpp:424] Successfully recovered 
registrar
I0314 14:30:11.244408 54067200 master.cpp:1792] Recovered 0 agents from the 
registry (155B); allowing 10mins for agents to reregister
I0314 14:30:11.23 52994048 hierarchical.cpp:213] Skipping recovery of 
hierarchical allocator: nothing to recover
W0314 14:30:11.247259 2290090816 process.cpp:2805] Attempted to spawn already 
running process files@10.0.49.4:54648
I0314 14:30:11.247681 2290090816 cluster.cpp:460] Creating default 'local' 
authorizer
I0314 14:30:11.248837 55676928 slave.cpp:265] Mesos agent started on 
(50)@10.0.49.4:54648
I0314 14:30:11.248865 55676928 slave.cpp:266] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/FaultToleranceTest_ReregisterCompletedFrameworks_UqvwBG/store/appc"
 --authenticate_http_executors="true" --