[jira] [Created] (MESOS-8679) If the first taskkill stuck in the default executor, all other killtasks will be ignored.
Gilbert Song created MESOS-8679: --- Summary: If the first taskkill stuck in the default executor, all other killtasks will be ignored. Key: MESOS-8679 URL: https://issues.apache.org/jira/browse/MESOS-8679 Project: Mesos Issue Type: Bug Components: executor Reporter: Gilbert Song If the first taskkill stuck in the default executor, all other killtasks will be ignored. It would make a particular task become unkillable (task_killing) forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8651) Potential memory leaks in the `volume/sandbox_path` isolator
[ https://issues.apache.org/jira/browse/MESOS-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401204#comment-16401204 ] Jason Lai commented on MESOS-8651: -- [~gilbert]: We have a patch available at https://reviews.apache.org/r/66104/. Thanks for the code review! cc [~zhitao], [~jieyu] > Potential memory leaks in the `volume/sandbox_path` isolator > > > Key: MESOS-8651 > URL: https://issues.apache.org/jira/browse/MESOS-8651 > Project: Mesos > Issue Type: Bug > Components: containerization >Reporter: Jason Lai >Assignee: Jason Lai >Priority: Major > Labels: easyfix, patch > > The {{sandboxes}} hashmap object of > {{mesos::internal::slave::VolumeSandboxPathIsolatorProcess}} bears the risk > of potential memory leak. > It [adds the sandbox path upon each container > launch|https://github.com/apache/mesos/blob/1.5.x/src/slave/containerizer/mesos/isolators/volume/sandbox_path.cpp#L119-L122] > and does not remove the sandbox path after cleaning up the container. As the > life cycle of an isolator is attached to that of {{MesosContainerizer}}, this > means that more and more sandbox paths will get added to the {{sandboxes}} > hashmap object, as Mesos containers keep being launched and will likely blow > up Mesos agent eventually. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-8678) Bump gRPC bundle to 1.10.0
Chun-Hung Hsiao created MESOS-8678: -- Summary: Bump gRPC bundle to 1.10.0 Key: MESOS-8678 URL: https://issues.apache.org/jira/browse/MESOS-8678 Project: Mesos Issue Type: Task Reporter: Chun-Hung Hsiao Assignee: Chun-Hung Hsiao Bump gRPC to 1.10.0 for better CMake support and new header paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-2976) Allow create multi master in test cases
[ https://issues.apache.org/jira/browse/MESOS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-2976: -- Assignee: (was: haosdent) > Allow create multi master in test cases > --- > > Key: MESOS-2976 > URL: https://issues.apache.org/jira/browse/MESOS-2976 > Project: Mesos > Issue Type: Task > Components: libprocess, master, test >Reporter: haosdent >Priority: Minor > > Master use "master" as the fixed pid.id. This make it impossible to start > multi masters in a same process at the same time. Also, current libprocess > only allow bind to one port in one process. In some test scenarios, we need > start multi masters at the same time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-8677) FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS
[ https://issues.apache.org/jira/browse/MESOS-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rojas reassigned MESOS-8677: -- Assignee: Alexander Rojas > FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS > > > Key: MESOS-8677 > URL: https://issues.apache.org/jira/browse/MESOS-8677 > Project: Mesos > Issue Type: Bug > Components: test > Environment: macOS 10.13.3 with LLVM 6.0.0 as well as with Apple LLVM > version 9.0.0 (clang-900.0.39.2) >Reporter: Jan Schlicht >Assignee: Alexander Rojas >Priority: Major > > Here's a {{GLOG_v=1}} run of the test: > {noformat} > [ RUN ] FaultToleranceTest.ReregisterCompletedFrameworks > I0314 14:30:11.240077 2290090816 cluster.cpp:172] Creating default 'local' > authorizer > I0314 14:30:11.241261 55140352 master.cpp:463] Master > 025f775d-9c75-43f6-9ee6-079a605fbf01 (Jenkinss-Mac-mini.local) started on > 10.0.49.4:54648 > I0314 14:30:11.241287 55140352 master.cpp:465] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --require_agent_domain="false" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/master" > --zk_session_timeout="10secs" > I0314 14:30:11.241439 55140352 master.cpp:514] Master only allowing > authenticated frameworks to register > I0314 14:30:11.241447 55140352 master.cpp:520] Master only allowing > authenticated agents to register > I0314 14:30:11.241452 55140352 master.cpp:526] Master only allowing > authenticated HTTP frameworks to register > I0314 14:30:11.241461 55140352 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials' > I0314 14:30:11.241678 55140352 master.cpp:570] Using default 'crammd5' > authenticator > I0314 14:30:11.241739 55140352 http.cpp:957] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0314 14:30:11.241824 55140352 http.cpp:957] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0314 14:30:11.241873 55140352 http.cpp:957] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0314 14:30:11.241919 55140352 master.cpp:649] Authorization enabled > I0314 14:30:11.242066 52457472 whitelist_watcher.cpp:77] No whitelist given > I0314 14:30:11.242079 51920896 hierarchical.cpp:175] Initialized hierarchical > allocator process > I0314 14:30:11.243557 52994048 master.cpp:2119] Elected as the leading master! > I0314 14:30:11.243574 52994048 master.cpp:1678] Recovering from registrar > I0314 14:30:11.243640 51920896 registrar.cpp:347] Recovering registrar > I0314 14:30:11.243852 52457472 registrar.cpp:391] Successfully fetched the > registry (0B) in 190976ns > I0314 14:30:11.243928 52457472 registrar.cpp:495] Applied 1 operations in > 28606ns; attempting to update the registry > I0314 14:30:11.244163 52457472 registrar.cpp:552] Successfully updated the > registry in 194816ns > I0314 14:30:11.244222 52457472 registrar.cpp:424] Successfully recovered > registrar > I0314 14:30:11.244408 54067200 master.cpp:1792] Recovered 0 agents from the > registry (155B); allowing 10mins for agents to reregister > I0314 14:30:11.23 52994048 hierarchical.cpp:213] Skipping recovery of > hierarchical allocator: nothing to recover > W0314 14:30:11.247259 2290090816 process.cpp:2805] Attempted to spawn already > running process files@10.0.49.4:54648 > I0314 14:30:11.247681 2290090816 cluster.c
[jira] [Commented] (MESOS-8677) FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS
[ https://issues.apache.org/jira/browse/MESOS-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400420#comment-16400420 ] Alexander Rojas commented on MESOS-8677: The issue seem to be using [{{hashset}}|https://github.com/apache/mesos/blob/master/src/common/http.cpp#L871] to ensure only one entry per action while constructing the {{ObjectApprovers}}. We were relying that different iterations would show the {{hashset}} contents in the same order, but this was not true in macOS (nor can we ever rely on that), particularly after the {{hashset}} was copied. Replacing the {{hashset}} for a {{set}} which does have a strong order guarantee, solves the issue. > FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS > > > Key: MESOS-8677 > URL: https://issues.apache.org/jira/browse/MESOS-8677 > Project: Mesos > Issue Type: Bug > Components: test > Environment: macOS 10.13.3 with LLVM 6.0.0 as well as with Apple LLVM > version 9.0.0 (clang-900.0.39.2) >Reporter: Jan Schlicht >Priority: Major > > Here's a {{GLOG_v=1}} run of the test: > {noformat} > [ RUN ] FaultToleranceTest.ReregisterCompletedFrameworks > I0314 14:30:11.240077 2290090816 cluster.cpp:172] Creating default 'local' > authorizer > I0314 14:30:11.241261 55140352 master.cpp:463] Master > 025f775d-9c75-43f6-9ee6-079a605fbf01 (Jenkinss-Mac-mini.local) started on > 10.0.49.4:54648 > I0314 14:30:11.241287 55140352 master.cpp:465] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" > --credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials" > --filter_gpu_resources="true" --framework_sorter="drf" --help="false" > --hostname_lookup="true" --http_authenticators="basic" > --http_framework_authenticators="basic" --initialize_driver_logging="true" > --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" > --max_agent_ping_timeouts="5" --max_completed_frameworks="50" > --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --require_agent_domain="false" --root_submissions="true" --user_sorter="drf" > --version="false" --webui_dir="/usr/local/share/mesos/webui" > --work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/master" > --zk_session_timeout="10secs" > I0314 14:30:11.241439 55140352 master.cpp:514] Master only allowing > authenticated frameworks to register > I0314 14:30:11.241447 55140352 master.cpp:520] Master only allowing > authenticated agents to register > I0314 14:30:11.241452 55140352 master.cpp:526] Master only allowing > authenticated HTTP frameworks to register > I0314 14:30:11.241461 55140352 credentials.hpp:37] Loading credentials for > authentication from > '/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials' > I0314 14:30:11.241678 55140352 master.cpp:570] Using default 'crammd5' > authenticator > I0314 14:30:11.241739 55140352 http.cpp:957] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0314 14:30:11.241824 55140352 http.cpp:957] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0314 14:30:11.241873 55140352 http.cpp:957] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0314 14:30:11.241919 55140352 master.cpp:649] Authorization enabled > I0314 14:30:11.242066 52457472 whitelist_watcher.cpp:77] No whitelist given > I0314 14:30:11.242079 51920896 hierarchical.cpp:175] Initialized hierarchical > allocator process > I0314 14:30:11.243557 52994048 master.cpp:2119] Elected as the leading master! > I0314 14:30:11.243574 52994048 master.cpp:1678] Recovering from registrar > I0314 14:30:11.243640 51920896 registrar.cpp:347] Recovering registrar > I0314 14:30:11.243852 52457472 registrar.cpp:391] Successfully fetched the > registry (0B) in 190976ns > I0314 14:30:11.243928 52457472 registrar.cpp:495] Applied 1 operations in > 28606ns; attempting to update the registry > I0314 14:30:11.244163 52457472 registrar.cpp:552] Successfully updated the > registry in 194816ns > I0314 14:30:11.244222 5245
[jira] [Created] (MESOS-8677) FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS
Jan Schlicht created MESOS-8677: --- Summary: FaulToleranceTest.ReregisterCompletedFrameworks crashes on macOS Key: MESOS-8677 URL: https://issues.apache.org/jira/browse/MESOS-8677 Project: Mesos Issue Type: Bug Components: test Environment: macOS 10.13.3 with LLVM 6.0.0 as well as with Apple LLVM version 9.0.0 (clang-900.0.39.2) Reporter: Jan Schlicht Here's a {{GLOG_v=1}} run of the test: {noformat} [ RUN ] FaultToleranceTest.ReregisterCompletedFrameworks I0314 14:30:11.240077 2290090816 cluster.cpp:172] Creating default 'local' authorizer I0314 14:30:11.241261 55140352 master.cpp:463] Master 025f775d-9c75-43f6-9ee6-079a605fbf01 (Jenkinss-Mac-mini.local) started on 10.0.49.4:54648 I0314 14:30:11.241287 55140352 master.cpp:465] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/master" --zk_session_timeout="10secs" I0314 14:30:11.241439 55140352 master.cpp:514] Master only allowing authenticated frameworks to register I0314 14:30:11.241447 55140352 master.cpp:520] Master only allowing authenticated agents to register I0314 14:30:11.241452 55140352 master.cpp:526] Master only allowing authenticated HTTP frameworks to register I0314 14:30:11.241461 55140352 credentials.hpp:37] Loading credentials for authentication from '/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/ZyMWb1/credentials' I0314 14:30:11.241678 55140352 master.cpp:570] Using default 'crammd5' authenticator I0314 14:30:11.241739 55140352 http.cpp:957] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0314 14:30:11.241824 55140352 http.cpp:957] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0314 14:30:11.241873 55140352 http.cpp:957] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0314 14:30:11.241919 55140352 master.cpp:649] Authorization enabled I0314 14:30:11.242066 52457472 whitelist_watcher.cpp:77] No whitelist given I0314 14:30:11.242079 51920896 hierarchical.cpp:175] Initialized hierarchical allocator process I0314 14:30:11.243557 52994048 master.cpp:2119] Elected as the leading master! I0314 14:30:11.243574 52994048 master.cpp:1678] Recovering from registrar I0314 14:30:11.243640 51920896 registrar.cpp:347] Recovering registrar I0314 14:30:11.243852 52457472 registrar.cpp:391] Successfully fetched the registry (0B) in 190976ns I0314 14:30:11.243928 52457472 registrar.cpp:495] Applied 1 operations in 28606ns; attempting to update the registry I0314 14:30:11.244163 52457472 registrar.cpp:552] Successfully updated the registry in 194816ns I0314 14:30:11.244222 52457472 registrar.cpp:424] Successfully recovered registrar I0314 14:30:11.244408 54067200 master.cpp:1792] Recovered 0 agents from the registry (155B); allowing 10mins for agents to reregister I0314 14:30:11.23 52994048 hierarchical.cpp:213] Skipping recovery of hierarchical allocator: nothing to recover W0314 14:30:11.247259 2290090816 process.cpp:2805] Attempted to spawn already running process files@10.0.49.4:54648 I0314 14:30:11.247681 2290090816 cluster.cpp:460] Creating default 'local' authorizer I0314 14:30:11.248837 55676928 slave.cpp:265] Mesos agent started on (50)@10.0.49.4:54648 I0314 14:30:11.248865 55676928 slave.cpp:266] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://"; --appc_store_dir="/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/FaultToleranceTest_ReregisterCompletedFrameworks_UqvwBG/store/appc" --authenticate_http_executors="true" --