[jira] [Created] (MESOS-9785) Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers.
Chun-Hung Hsiao created MESOS-9785: -- Summary: Frameworks recovered from reregistered agents are not reported to master `/api/v1` subscribers. Key: MESOS-9785 URL: https://issues.apache.org/jira/browse/MESOS-9785 Project: Mesos Issue Type: Bug Components: HTTP API Reporter: Chun-Hung Hsiao Currently when an operator subscribes to the {{/api/v1}} master endpoint, it would receive a {{SUBSCRIBED}} event carrying information about all known frameworks, including registered ones and unregistered ones. If an unregistered framework reregisters later, a {{FRAMEWORK_UPDATED}} event would be sent to the operator. However, if an operator subscribes to the {{/api/v1}} master endpoint after a master failover but before any of the frameworks and agents reregisters, {{SUBSCRIBED}} would contain no recovered framework information. When a agent with running tasks reregisters later, unregistered frameworks of those tasks will be recovered, but no {{FRAMEWORK_ADDED}} will be sent to the operator, so the operator will receive {{TASK_ADDED}} for those tasks with unknown framework IDs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9677) RPM packages should be built with launcher sealing
[ https://issues.apache.org/jira/browse/MESOS-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benno Evers reassigned MESOS-9677: -- Resolution: Fixed Assignee: Benno Evers Target Version/s: (was: 1.8.0) > RPM packages should be built with launcher sealing > -- > > Key: MESOS-9677 > URL: https://issues.apache.org/jira/browse/MESOS-9677 > Project: Mesos > Issue Type: Task > Components: build >Affects Versions: 1.8.0 >Reporter: Benjamin Bannier >Assignee: Benno Evers >Priority: Major > Labels: integration, mesosphere, packaging, rpm > Fix For: 1.8.1 > > > We should consider enabling launcher sealing in the Mesos RPM packages. Since > this feature is built conditionally, it is hard to write e.g., module code > against Mesos packages since required functions might be missing (e.g., > [https://github.com/dcos/dcos-mesos-modules/commit/8ce70e6cc789054831daa3058647e326b2b11bc9] > cannot be linked against the default RPM package anymore). The RPM's target > platform centos7 should include a recent enough kernel for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9677) RPM packages should be built with launcher sealing
[ https://issues.apache.org/jira/browse/MESOS-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839611#comment-16839611 ] Benno Evers commented on MESOS-9677: master: {noformat} commit 7ff4263c371f8a51551199c694837cf371923137 Author: Benjamin Bannier Date: Mon Apr 8 14:37:32 2019 +0200 Enabled launcher sealing for RPM packages. We enable this flag since with it disabled certain public functions are not available making it hard to e.g., write modules against this version of Mesos. While launcher sealing depends on a recent kernel, the platform we build RPMs for already satisfies the requirements. Review: https://reviews.apache.org/r/70295 {noformat} 1.8.x: {noformat} commit 3bc6082afe75390dc3b0abd58d6ce85827709b89 (origin/1.8.x, 1.8.x) Author: Benjamin Bannier Date: Mon Apr 8 14:37:32 2019 +0200 Enabled launcher sealing for RPM packages. We enable this flag since with it disabled certain public functions are not available making it hard to e.g., write modules against this version of Mesos. While launcher sealing depends on a recent kernel, the platform we build RPMs for already satisfies the requirements. Review: https://reviews.apache.org/r/70295 {noformat} > RPM packages should be built with launcher sealing > -- > > Key: MESOS-9677 > URL: https://issues.apache.org/jira/browse/MESOS-9677 > Project: Mesos > Issue Type: Task > Components: build >Affects Versions: 1.8.0 >Reporter: Benjamin Bannier >Priority: Major > Labels: integration, mesosphere, packaging, rpm > > We should consider enabling launcher sealing in the Mesos RPM packages. Since > this feature is built conditionally, it is hard to write e.g., module code > against Mesos packages since required functions might be missing (e.g., > [https://github.com/dcos/dcos-mesos-modules/commit/8ce70e6cc789054831daa3058647e326b2b11bc9] > cannot be linked against the default RPM package anymore). The RPM's target > platform centos7 should include a recent enough kernel for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9329) CMake build on Fedora 28 fails due to libevent error
[ https://issues.apache.org/jira/browse/MESOS-9329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839483#comment-16839483 ] Alexander Rukletsov commented on MESOS-9329: Indeed, the autotools build uses a newer version of libevent, [2.0.22|https://github.com/apache/mesos/blob/a9a2acabd03181865055b77cf81e7bb310b236d6/3rdparty/libevent-2.0.22-stable.tar.gz]. We can't easily use it in the cmake build because newer versions do not support cmake, see MESOS-3529. Bottom line is: a cmake build on Linux with ssl and libevent enabled is currently not supported. > CMake build on Fedora 28 fails due to libevent error > > > Key: MESOS-9329 > URL: https://issues.apache.org/jira/browse/MESOS-9329 > Project: Mesos > Issue Type: Bug >Reporter: Benno Evers >Priority: Major > > Trying to build Mesos using cmake with the options > {noformat} > cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_SSL=1 -DENABLE_LIBEVENT=1 > {noformat} > fails due to the following: > {noformat} > [ 1%] Building C object CMakeFiles/event_extra.dir/bufferevent_openssl.c.o > /home/bevers/mesos/worktrees/master/build-cmake/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c: > In function ‘bio_bufferevent_new’: > /home/bevers/mesos/worktrees/master/build-cmake/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c:112:3: > error: dereferencing pointer to incomplete type ‘BIO’ {aka ‘struct bio_st’} > b->init = 0; >^~ > /home/bevers/mesos/worktrees/master/build-cmake/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c: > At top level: > /home/bevers/mesos/worktrees/master/build-cmake/3rdparty/libevent-2.1.5-beta/src/libevent-2.1.5-beta/bufferevent_openssl.c:234:1: > error: variable ‘methods_bufferevent’ has initializer but incomplete type > static BIO_METHOD methods_bufferevent = { > [...] > {noformat} > Since the autotools build does not have issues when enabling libevent and > ssl, it seems most likely that the `libevent-2.1.5-beta` version used by > default in the cmake build is somehow connected to the error message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9749) mesos agent logging hangs upon systemd-journald restart
[ https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839419#comment-16839419 ] Joseph Wu commented on MESOS-9749: -- The agent ends up in a bad state because the stdout/err pipe gets filled, and therefore starts to block threads. This can lead to unpredictable results (since we aren't sure which threads are blocked by IO). If the logs are not written directly to journald, then you won't need a restart of the agent. It should remain functional during the time journald is down. Of course, restarting the agent is still an option. > mesos agent logging hangs upon systemd-journald restart > --- > > Key: MESOS-9749 > URL: https://issues.apache.org/jira/browse/MESOS-9749 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.7.2 > Environment: Running on centos 7.4.1708, systemd 219 (probably > heavily patched by centos) > mesos-agent command: > {code} > /usr/sbin/mesos-slave \ > > --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1' > \ > --cgroups_enable_cfs \ > --cgroups_hierarchy='/sys/fs/cgroup' \ > --cgroups_net_cls_primary_handle='0xC370' \ > --container_logger='org_apache_mesos_LogrotateContainerLogger' \ > --containerizers='mesos' \ > --credential='file:///etc/mesos-chef/slave-credential' \ > > --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}' > \ > --docker_registry='https://filer-docker-registry.prod.crto.in/' \ > --docker_store_dir='/var/opt/mesos/store/docker' \ > --enforce_container_disk_quota \ > > --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}' > \ > --executor_registration_timeout='5mins' \ > --fetcher_cache_dir='/var/opt/mesos/cache' \ > --fetcher_cache_size='2GB' \ > --hooks='com_criteo_mesos_CommandHook' \ > --image_providers='docker' \ > --image_provisioner_backend='copy' \ > > --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator' > \ > --logging_level='INFO' \ > > --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos' > \ > --modules='file:///etc/mesos-chef/slave-modules.json' \ > --port=5051 \ > --recover='reconnect' \ > --resources='file:///etc/mesos-chef/custom_resources.json' \ > --strict \ > --work_dir='/var/opt/mesos' \ > --xfs_kill_containers \ > --xfs_project_range='[5000-50]' > {code} >Reporter: Gregoire Seux >Priority: Minor > Labels: foundations > > When mesos agent is launched through systemd, a restart of systemd-journald > service makes mesos agent logging hang (no more output).. The process itself > seems to work fine (we can query state via http for instance). > A restart of mesos-agent corrects the issue. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9749) mesos agent logging hangs upon systemd-journald restart
[ https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839412#comment-16839412 ] Gregoire Seux commented on MESOS-9749: -- Thanks [~kaysoky] for your reply. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771122 is very instructive indeed. It was closed however by saying it was fixed upstream (in systemd-217?). On our deployment we are trying to introduce a relationship between mesos and journald to force a restart of mesos-slave if something restart journald (https://github.com/criteo-forks/mesos_cookbook/pull/14). The real issue though is not logging but the more general problem of the agent state after a journald restart (MESOS-9772) > mesos agent logging hangs upon systemd-journald restart > --- > > Key: MESOS-9749 > URL: https://issues.apache.org/jira/browse/MESOS-9749 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.7.2 > Environment: Running on centos 7.4.1708, systemd 219 (probably > heavily patched by centos) > mesos-agent command: > {code} > /usr/sbin/mesos-slave \ > > --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1' > \ > --cgroups_enable_cfs \ > --cgroups_hierarchy='/sys/fs/cgroup' \ > --cgroups_net_cls_primary_handle='0xC370' \ > --container_logger='org_apache_mesos_LogrotateContainerLogger' \ > --containerizers='mesos' \ > --credential='file:///etc/mesos-chef/slave-credential' \ > > --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}' > \ > --docker_registry='https://filer-docker-registry.prod.crto.in/' \ > --docker_store_dir='/var/opt/mesos/store/docker' \ > --enforce_container_disk_quota \ > > --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}' > \ > --executor_registration_timeout='5mins' \ > --fetcher_cache_dir='/var/opt/mesos/cache' \ > --fetcher_cache_size='2GB' \ > --hooks='com_criteo_mesos_CommandHook' \ > --image_providers='docker' \ > --image_provisioner_backend='copy' \ > > --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator' > \ > --logging_level='INFO' \ > > --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos' > \ > --modules='file:///etc/mesos-chef/slave-modules.json' \ > --port=5051 \ > --recover='reconnect' \ > --resources='file:///etc/mesos-chef/custom_resources.json' \ > --strict \ > --work_dir='/var/opt/mesos' \ > --xfs_kill_containers \ > --xfs_project_range='[5000-50]' > {code} >Reporter: Gregoire Seux >Priority: Minor > Labels: foundations > > When mesos agent is launched through systemd, a restart of systemd-journald > service makes mesos agent logging hang (no more output).. The process itself > seems to work fine (we can query state via http for instance). > A restart of mesos-agent corrects the issue. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9749) mesos agent logging hangs upon systemd-journald restart
[ https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839408#comment-16839408 ] Joseph Wu commented on MESOS-9749: -- The default behavior of Mesos's logging is to write to stdout/stderr. When launching via systemd, this means you are writing to journald. And if journald is restarted, the pipe between the agent and journald would be broken. These sorts of broken pipes usually terminate the agent, but it seems to be different in systemd's case. See also: [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771122] There are a variety of ways to get around this, basically involving writing logs to some other location: --- h2. Built-in solutions Mesos lets you write stdout/stderr to disk instead. If you specify the {{--log_dir}} flag, Mesos will leverage glog's log writing behavior, which has some form of log rotation built in. But unfortunately, this does not seem to bound the size of logs on disk, so you'd end up writing a script or such to clean up logs. Besides that, you may modify your service file to write to something besides journald, such as syslog, or a file. https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Logging%20and%20Standard%20Input/Output h2. Other solutions By the looks of your agent configuration, you are not averse to deploying modules ({{--modules='file:///etc/mesos-chef/slave-modules.json'}}). In this case, you have some other options. DC/OS uses a {{LogSink}} module (which is a Mesos Anonymous module implementing a glog module) to pipe logs to file, which are then rotated by another timer. https://github.com/dcos/dcos-mesos-modules/tree/master/logsink If the goal is to get logs into journald, across journald restarts, this is also possible with a {{LogSink}}. This would entail using the journald C API, like {{sd_journal_send}}. I believe this is capable of reconnecting after journald restarts. https://www.freedesktop.org/software/systemd/man/sd_journal_print.html > mesos agent logging hangs upon systemd-journald restart > --- > > Key: MESOS-9749 > URL: https://issues.apache.org/jira/browse/MESOS-9749 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.7.2 > Environment: Running on centos 7.4.1708, systemd 219 (probably > heavily patched by centos) > mesos-agent command: > {code} > /usr/sbin/mesos-slave \ > > --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1' > \ > --cgroups_enable_cfs \ > --cgroups_hierarchy='/sys/fs/cgroup' \ > --cgroups_net_cls_primary_handle='0xC370' \ > --container_logger='org_apache_mesos_LogrotateContainerLogger' \ > --containerizers='mesos' \ > --credential='file:///etc/mesos-chef/slave-credential' \ > > --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}' > \ > --docker_registry='https://filer-docker-registry.prod.crto.in/' \ > --docker_store_dir='/var/opt/mesos/store/docker' \ > --enforce_container_disk_quota \ > > --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}' > \ > --executor_registration_timeout='5mins' \ > --fetcher_cache_dir='/var/opt/mesos/cache' \ > --fetcher_cache_size='2GB' \ > --hooks='com_criteo_mesos_CommandHook' \ > --image_providers='docker' \ > --image_provisioner_backend='copy' \ > > --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator' > \ > --logging_level='INFO' \ > > --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos' > \ > --modules='file:///etc/mesos-chef/slave-modules.json' \ > --port=5051 \ > --recover='reconnect' \ > --resources='file:///etc/mesos-chef/custom_resources.json' \ > --strict \ > --work_dir='/var/opt/mesos' \ > --xfs_kill_containers \ > --xfs_project_range='[5000-50]' > {code} >Reporter: Gregoire Seux >Priority: Minor > Labels: foundations > > When mesos agent is launched through systemd, a restart of systemd-journald > service makes mesos agent logging hang (no more output).. The process itself > seems to work fine (we can query state via http for instance). > A restart of mesos-agent corrects the issue. > > -- This message was sent by Atla
[jira] [Commented] (MESOS-9749) mesos agent logging hangs upon systemd-journald restart
[ https://issues.apache.org/jira/browse/MESOS-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839372#comment-16839372 ] Benjamin Mahler commented on MESOS-9749: cc [~kaysoky] > mesos agent logging hangs upon systemd-journald restart > --- > > Key: MESOS-9749 > URL: https://issues.apache.org/jira/browse/MESOS-9749 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.7.2 > Environment: Running on centos 7.4.1708, systemd 219 (probably > heavily patched by centos) > mesos-agent command: > {code} > /usr/sbin/mesos-slave \ > > --attributes='canary:canary-false;maintenance_group:group-6;network:10g;platform:centos;platform_major_version:7;rack_name:22.05;type:base;version:v2018-q-1' > \ > --cgroups_enable_cfs \ > --cgroups_hierarchy='/sys/fs/cgroup' \ > --cgroups_net_cls_primary_handle='0xC370' \ > --container_logger='org_apache_mesos_LogrotateContainerLogger' \ > --containerizers='mesos' \ > --credential='file:///etc/mesos-chef/slave-credential' \ > > --default_container_info='\{"type":"MESOS","volumes":[{"host_path":"tmp","container_path":"/tmp","mode":"RW"},\{"host_path":"var_tmp","container_path":"/var/tmp","mode":"RW"},\{"host_path":".","container_path":"/mnt/mesos/sandbox","mode":"RW"},\{"host_path":"/usr/share/mesos/geoip","container_path":"/mnt/mesos/geoip","mode":"RO"}]}' > \ > --docker_registry='https://filer-docker-registry.prod.crto.in/' \ > --docker_store_dir='/var/opt/mesos/store/docker' \ > --enforce_container_disk_quota \ > > --executor_environment_variables='\{"PATH":"/bin:/usr/bin","CRITEO_DC":"par","CRITEO_ENV":"prod","CRITEO_GEOIP_PATH":"/mnt/mesos/geoip"}' > \ > --executor_registration_timeout='5mins' \ > --fetcher_cache_dir='/var/opt/mesos/cache' \ > --fetcher_cache_size='2GB' \ > --hooks='com_criteo_mesos_CommandHook' \ > --image_providers='docker' \ > --image_provisioner_backend='copy' \ > > --isolation='linux/capabilities,cgroups/cpu,cgroups/mem,cgroups/net_cls,namespaces/pid,filesystem/linux,docker/runtime,network/cni,disk/xfs,com_criteo_mesos_CommandIsolator' > \ > --logging_level='INFO' \ > > --master='zk://mesos:xx...@mesos-master01-par.central.criteo.prod:2181,mesos-master02-par.central.criteo.prod:2181,mesos-master03-par.central.criteo.prod:2181/mesos' > \ > --modules='file:///etc/mesos-chef/slave-modules.json' \ > --port=5051 \ > --recover='reconnect' \ > --resources='file:///etc/mesos-chef/custom_resources.json' \ > --strict \ > --work_dir='/var/opt/mesos' \ > --xfs_kill_containers \ > --xfs_project_range='[5000-50]' > {code} >Reporter: Gregoire Seux >Priority: Minor > Labels: foundations > > When mesos agent is launched through systemd, a restart of systemd-journald > service makes mesos agent logging hang (no more output).. The process itself > seems to work fine (we can query state via http for instance). > A restart of mesos-agent corrects the issue. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9750) Agent V1 GET_STATE response may report a complete executor's tasks as non-terminal after a graceful agent shutdown
[ https://issues.apache.org/jira/browse/MESOS-9750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839373#comment-16839373 ] Joseph Wu commented on MESOS-9750: -- Found one more code path where the agent's {{GET_STATE}} will return extraneous "launched_tasks". This happens when a Framework or Master {{TEARDOWN}} call is used and the executor does not send a terminal status update in time. This one does not require an agent restart/shutdown. Also, this code path will result in an executor's checkpointed state looking identical to the agent shutdown case. If the agent is restarted, the code in the above patch will be run to put the agent back into a consistent state. Fix and test here: https://reviews.apache.org/r/70641/ > Agent V1 GET_STATE response may report a complete executor's tasks as > non-terminal after a graceful agent shutdown > -- > > Key: MESOS-9750 > URL: https://issues.apache.org/jira/browse/MESOS-9750 > Project: Mesos > Issue Type: Bug > Components: agent, executor >Affects Versions: 1.6.0, 1.7.0, 1.8.0 >Reporter: Joseph Wu >Assignee: Joseph Wu >Priority: Major > Labels: foundations > > When the following steps occur: > 1) A graceful shutdown is initiated on the agent (i.e. SIGUSR1 or > /master/machine/down). > 2) The executor is sent a kill, and the agent counts down on > {{executor_shutdown_grace_period}}. > 3) The executor exits, before all terminal status updates reach the agent. > This is more likely if {{executor_shutdown_grace_period}} passes. > This results in a completed executor, with non-terminal tasks (according to > status updates). > When the agent starts back up, the completed executor will be recovered and > shows up correctly as a completed executor in {{/state}}. However, if you > fetch the V1 {{GET_STATE}} result, there will be an entry in > {{launched_tasks}} even though nothing is running. > {code} > get_tasks { > launched_tasks { > name: "test-task" > task_id { > value: "dff5a155-47f1-4a71-9b92-30ca059ab456" > } > framework_id { > value: "4b34a3aa-f651-44a9-9b72-58edeede94ef-" > } > executor_id { > value: "default" > } > agent_id { > value: "4b34a3aa-f651-44a9-9b72-58edeede94ef-S0" > } > state: TASK_RUNNING > resources { ... } > resources { ... } > resources { ... } > resources { ... } > statuses { > task_id { > value: "dff5a155-47f1-4a71-9b92-30ca059ab456" > } > state: TASK_RUNNING > agent_id { > value: "4b34a3aa-f651-44a9-9b72-58edeede94ef-S0" > } > timestamp: 1556674758.2175469 > executor_id { > value: "default" > } > source: SOURCE_EXECUTOR > uuid: "xPmn\234\236F&\235\\d\364\326\323\222\224" > container_status { ... } > } > } > } > get_executors { > completed_executors { > executor_info { > executor_id { > value: "default" > } > command { > value: "" > } > framework_id { > value: "4b34a3aa-f651-44a9-9b72-58edeede94ef-" > } > } > } > } > get_frameworks { > completed_frameworks { > framework_info { > user: "user" > name: "default" > id { > value: "4b34a3aa-f651-44a9-9b72-58edeede94ef-" > } > checkpoint: true > hostname: "localhost" > principal: "test-principal" > capabilities { > type: MULTI_ROLE > } > capabilities { > type: RESERVATION_REFINEMENT > } > roles: "*" > } > } > } > {code} > This happens because we combine executors and completed executors when > constructing the response. The terminal task(s) with non-terminal updates > appear under completed executors. > https://github.com/apache/mesos/blob/89c3dd95a421e14044bc91ceb1998ff4ae3883b4/src/slave/http.cpp#L1734-L1756 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9783) Centos 6 RPM build is broken on Apache CI
[ https://issues.apache.org/jira/browse/MESOS-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839310#comment-16839310 ] Benno Evers commented on MESOS-9783: master: {noformat} commit a9a2acabd03181865055b77cf81e7bb310b236d6 Author: Benno Evers Date: Tue May 14 11:52:59 2019 +0200 Updated URL in CentOS 6 Dockerfile. The link was pointing to a rpm package that was apparently replaced on the upstream file server. Review: https://reviews.apache.org/r/70639 {noformat} 1.8.x: {noformat} commit 5ca16bfeae19c193f4e67390543d08897a0f4ab8 Author: Benno Evers Date: Tue May 14 11:52:59 2019 +0200 Updated URL in CentOS 6 Dockerfile. The link was pointing to a rpm package that was apparently replaced on the upstream file server. Review: https://reviews.apache.org/r/70639 {noformat} 1.7.x: {noformat} commit cfc7e6e9905329460d182150a91317b3e0a75157 Author: Benno Evers Date: Tue May 14 11:52:59 2019 +0200 Updated URL in CentOS 6 Dockerfile. The link was pointing to a rpm package that was apparently replaced on the upstream file server. Review: https://reviews.apache.org/r/70639 {noformat} > Centos 6 RPM build is broken on Apache CI > - > > Key: MESOS-9783 > URL: https://issues.apache.org/jira/browse/MESOS-9783 > Project: Mesos > Issue Type: Improvement >Reporter: Benno Evers >Assignee: Benno Evers >Priority: Major > Labels: foundations > > The centos 6 rpm build on the Apache CI on `build.apache.org` has been broken > since April 16, as it fails on the following step: > {noformat} > RUN rpm -Uvh --replacepkgs \ > > http://yum.postgresql.org/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-2.noarch.rpm > {noformat} > The URL returns a 404 response because the package was removed from the > upstream fileserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9784) Server side SSL Certificate Validation
Vinod Kone created MESOS-9784: - Summary: Server side SSL Certificate Validation Key: MESOS-9784 URL: https://issues.apache.org/jira/browse/MESOS-9784 Project: Mesos Issue Type: Epic Reporter: Vinod Kone Assignee: Benno Evers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9783) Centos 6 RPM build is broken on Apache CI
[ https://issues.apache.org/jira/browse/MESOS-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839282#comment-16839282 ] Benno Evers commented on MESOS-9783: https://reviews.apache.org/r/70639/diff/1#index_header > Centos 6 RPM build is broken on Apache CI > - > > Key: MESOS-9783 > URL: https://issues.apache.org/jira/browse/MESOS-9783 > Project: Mesos > Issue Type: Improvement >Reporter: Benno Evers >Assignee: Benno Evers >Priority: Major > Labels: foundations > > The centos 6 rpm build on the Apache CI on `build.apache.org` has been broken > since April 16, as it fails on the following step: > {noformat} > RUN rpm -Uvh --replacepkgs \ > > http://yum.postgresql.org/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-2.noarch.rpm > {noformat} > The URL returns a 404 response because the package was removed from the > upstream fileserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9783) Centos 6 RPM build is broken on Apache CI
[ https://issues.apache.org/jira/browse/MESOS-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benno Evers reassigned MESOS-9783: -- Assignee: Benno Evers > Centos 6 RPM build is broken on Apache CI > - > > Key: MESOS-9783 > URL: https://issues.apache.org/jira/browse/MESOS-9783 > Project: Mesos > Issue Type: Improvement >Reporter: Benno Evers >Assignee: Benno Evers >Priority: Major > Labels: foundations > > The centos 6 rpm build on the Apache CI on `build.apache.org` has been broken > since April 16, as it fails on the following step: > {noformat} > RUN rpm -Uvh --replacepkgs \ > > http://yum.postgresql.org/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-2.noarch.rpm > {noformat} > The URL returns a 404 response because the package was removed from the > upstream fileserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9783) Centos 6 RPM build is broken on Apache CI
Benno Evers created MESOS-9783: -- Summary: Centos 6 RPM build is broken on Apache CI Key: MESOS-9783 URL: https://issues.apache.org/jira/browse/MESOS-9783 Project: Mesos Issue Type: Improvement Reporter: Benno Evers The centos 6 rpm build on the Apache CI on `build.apache.org` has been broken since April 16, as it fails on the following step: {noformat} RUN rpm -Uvh --replacepkgs \ http://yum.postgresql.org/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-2.noarch.rpm {noformat} The URL returns a 404 response because the package was removed from the upstream fileserver. -- This message was sent by Atlassian JIRA (v7.6.3#76005)