[jira] [Updated] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang updated MESOS-7975: -- Sprint: Mesosphere Sprint 65 (was: Mesosphere Sprint 66) > The command/default/docker executor can incorrectly send a TASK_FINISHED > update even when the task is killed > > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default/command/docker executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8077) Explore using 'option optimize_for = CODE_SIZE' to speed up compile time for unoptimized builds.
[ https://issues.apache.org/jira/browse/MESOS-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-8077: --- Component/s: build > Explore using 'option optimize_for = CODE_SIZE' to speed up compile time for > unoptimized builds. > > > Key: MESOS-8077 > URL: https://issues.apache.org/jira/browse/MESOS-8077 > Project: Mesos > Issue Type: Improvement > Components: build >Reporter: Benjamin Mahler > > Protobuf exposes an option for optimizing for {{SPEED}} or {{CODE_SIZE}}. > {{SPEED}} is the default, and we should explore using {{CODE_SIZE}} for > unoptimized builds to see if this speeds up compile times. > https://developers.google.com/protocol-buffers/docs/proto#options -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8077) Explore using 'option optimize_for = CODE_SIZE' to speed up compile time for unoptimized builds.
Benjamin Mahler created MESOS-8077: -- Summary: Explore using 'option optimize_for = CODE_SIZE' to speed up compile time for unoptimized builds. Key: MESOS-8077 URL: https://issues.apache.org/jira/browse/MESOS-8077 Project: Mesos Issue Type: Improvement Reporter: Benjamin Mahler Protobuf exposes an option for optimizing for {{SPEED}} or {{CODE_SIZE}}. {{SPEED}} is the default, and we should explore using {{CODE_SIZE}} for unoptimized builds to see if this speeds up compile times. https://developers.google.com/protocol-buffers/docs/proto#options -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7975) The command/default/docker executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Zhang updated MESOS-7975: -- Sprint: Mesosphere Sprint 66 (was: Mesosphere Sprint 65) > The command/default/docker executor can incorrectly send a TASK_FINISHED > update even when the task is killed > > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default/command/docker executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8076) PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky.
Alexander Rukletsov created MESOS-8076: -- Summary: PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky. Key: MESOS-8076 URL: https://issues.apache.org/jira/browse/MESOS-8076 Project: Mesos Issue Type: Bug Components: test Affects Versions: 1.5.0 Reporter: Alexander Rukletsov Attachments: SharedPersistentVolumeRescindOnDestroy-badrun.txt, SharedPersistentVolumeRescindOnDestroy-goodrun.txt I'm observing {{ROOT_MountDiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/0}} being flaky on our internal CI. From what I see in the logs, when {{framework1}} accepts an offer, creates volumes, launches a task, and kills it right after, the executor might manage to register in-between and hence an unexpected {{TASK_RUNNING}} status update is sent. To fix this, one approach is to explicitly wait for {{TASK_RUNNING}} before attempting to kill the task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8076) PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky.
[ https://issues.apache.org/jira/browse/MESOS-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-8076: --- Attachment: SharedPersistentVolumeRescindOnDestroy-goodrun.txt SharedPersistentVolumeRescindOnDestroy-badrun.txt > PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky. > - > > Key: MESOS-8076 > URL: https://issues.apache.org/jira/browse/MESOS-8076 > Project: Mesos > Issue Type: Bug > Components: test >Affects Versions: 1.5.0 >Reporter: Alexander Rukletsov > Labels: flaky, flaky-test > Attachments: SharedPersistentVolumeRescindOnDestroy-badrun.txt, > SharedPersistentVolumeRescindOnDestroy-goodrun.txt > > > I'm observing > {{ROOT_MountDiskResource/PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy/0}} > being flaky on our internal CI. From what I see in the logs, when > {{framework1}} accepts an offer, creates volumes, launches a task, and kills > it right after, the executor might manage to register in-between and hence an > unexpected {{TASK_RUNNING}} status update is sent. To fix this, one approach > is to explicitly wait for {{TASK_RUNNING}} before attempting to kill the task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4527: Description: Similar to MESOS-4526, reserved resources should be accounted for in the quota role sorter regardless of their allocation state. (was: Similar to MESOS-4526, reserved resources should be accounted for in the quota role sorter regardless of their allocation state. In the short-term, we should at least account them if they are allocated.) > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > Similar to MESOS-4526, reserved resources should be accounted for in the > quota role sorter regardless of their allocation state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4527: Shepherd: (was: Joris Van Remoortere) > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere, multitenancy > > Since unallocated reservations are not accounted towards the guarantee, we > might unfairly allocate guarantees or exceed limit. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7398) HierarchicalAllocatorProcess::allocatable make strong assumptions about both resource providers and users
[ https://issues.apache.org/jira/browse/MESOS-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7398: --- Labels: multitenancy tech-debt (was: tech-debt) > HierarchicalAllocatorProcess::allocatable make strong assumptions about both > resource providers and users > - > > Key: MESOS-7398 > URL: https://issues.apache.org/jira/browse/MESOS-7398 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier > Labels: multitenancy, tech-debt > > The function {{HierarchicalAllocatorProcess::allocatable}} is used in the > allocator to decide whether a set of resources will be considered when > calculating offers. It currently hardcodes minimal requirements for a number > of common resource kinds. > While it seems to have in the past be used to enforce the offer side of > minimal task resources by not offering resources which we didn't want to be > used for tasks, it now seems to mainly help to minimize performance overhead > from too many small offers (instead too small resource amounts are kept out > of the offer pool until they became accumulated into larger resources). > While {{allocatable}} has already in the past prevented allocating sets of > only certain resource kinds (e.g., a {{Resources}} holding only GPU is not > {{allocatable}}; the same holds for custom resource kinds), the current > approach breaks down with the introduction of resource providers with > MESOS-7235 which might provide a single kind of resource each and which in > the case of external resource providers might never "reside" on the same > agent as e.g., CPU. > It seems that we need to separate the different concerns of {{allocatable}} > into dedicated functions, and adjust it to remain useful in a world of > (external) resource providers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7398) HierarchicalAllocatorProcess::allocatable make strong assumptions about both resource providers and users
[ https://issues.apache.org/jira/browse/MESOS-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7398: --- Component/s: multitenancy > HierarchicalAllocatorProcess::allocatable make strong assumptions about both > resource providers and users > - > > Key: MESOS-7398 > URL: https://issues.apache.org/jira/browse/MESOS-7398 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier > Labels: multitenancy, tech-debt > > The function {{HierarchicalAllocatorProcess::allocatable}} is used in the > allocator to decide whether a set of resources will be considered when > calculating offers. It currently hardcodes minimal requirements for a number > of common resource kinds. > While it seems to have in the past be used to enforce the offer side of > minimal task resources by not offering resources which we didn't want to be > used for tasks, it now seems to mainly help to minimize performance overhead > from too many small offers (instead too small resource amounts are kept out > of the offer pool until they became accumulated into larger resources). > While {{allocatable}} has already in the past prevented allocating sets of > only certain resource kinds (e.g., a {{Resources}} holding only GPU is not > {{allocatable}}; the same holds for custom resource kinds), the current > approach breaks down with the introduction of resource providers with > MESOS-7235 which might provide a single kind of resource each and which in > the case of external resource providers might never "reside" on the same > agent as e.g., CPU. > It seems that we need to separate the different concerns of {{allocatable}} > into dedicated functions, and adjust it to remain useful in a world of > (external) resource providers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4527: Description: Since unallocated reservations are not accounted towards the guarantee, we might unfairly allocate guarantees or exceed limit. (was: Similar to MESOS-4526, reserved resources should be accounted for in the quota role sorter regardless of their allocation state.) > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere, multitenancy > > Since unallocated reservations are not accounted towards the guarantee, we > might unfairly allocate guarantees or exceed limit. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7398) HierarchicalAllocatorProcess::allocatable make strong assumptions about both resource providers and users
[ https://issues.apache.org/jira/browse/MESOS-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7398: --- Component/s: (was: multitenancy) > HierarchicalAllocatorProcess::allocatable make strong assumptions about both > resource providers and users > - > > Key: MESOS-7398 > URL: https://issues.apache.org/jira/browse/MESOS-7398 > Project: Mesos > Issue Type: Improvement > Components: allocation >Reporter: Benjamin Bannier > Labels: multitenancy, tech-debt > > The function {{HierarchicalAllocatorProcess::allocatable}} is used in the > allocator to decide whether a set of resources will be considered when > calculating offers. It currently hardcodes minimal requirements for a number > of common resource kinds. > While it seems to have in the past be used to enforce the offer side of > minimal task resources by not offering resources which we didn't want to be > used for tasks, it now seems to mainly help to minimize performance overhead > from too many small offers (instead too small resource amounts are kept out > of the offer pool until they became accumulated into larger resources). > While {{allocatable}} has already in the past prevented allocating sets of > only certain resource kinds (e.g., a {{Resources}} holding only GPU is not > {{allocatable}}; the same holds for custom resource kinds), the current > approach breaks down with the introduction of resource providers with > MESOS-7235 which might provide a single kind of resource each and which in > the case of external resource providers might never "reside" on the same > agent as e.g., CPU. > It seems that we need to separate the different concerns of {{allocatable}} > into dedicated functions, and adjust it to remain useful in a world of > (external) resource providers. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4527: Target Version/s: (was: 0.27.0) > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere, multitenancy > > Similar to MESOS-4526, reserved resources should be accounted for in the > quota role sorter regardless of their allocation state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201233#comment-16201233 ] Michael Park commented on MESOS-4527: - This ticket was closed since it satisfied > In the short-term, we should at least account them if they are allocated. I'm reopening this since it didn't actually accomplish the title of the ticket. > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > Similar to MESOS-4526, reserved resources should be accounted for in the > quota role sorter regardless of their allocation state. In the short-term, we > should at least account them if they are allocated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201233#comment-16201233 ] Michael Park edited comment on MESOS-4527 at 10/11/17 11:53 PM: This ticket was closed since it satisfied {quote}In the short-term, we should at least account them if they are allocated.{quote} I'm reopening this since it didn't actually accomplish the title of the ticket. was (Author: mcypark): This ticket was closed since it satisfied > In the short-term, we should at least account them if they are allocated. I'm reopening this since it didn't actually accomplish the title of the ticket. > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > Similar to MESOS-4526, reserved resources should be accounted for in the > quota role sorter regardless of their allocation state. In the short-term, we > should at least account them if they are allocated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4527: Labels: mesosphere multitenancy (was: mesosphere) > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere, multitenancy > > Similar to MESOS-4526, reserved resources should be accounted for in the > quota role sorter regardless of their allocation state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-4527) Include allocated portion of the reserved resources in the quota role sorter for DRF.
[ https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Park updated MESOS-4527: Fix Version/s: (was: 0.27.0) > Include allocated portion of the reserved resources in the quota role sorter > for DRF. > - > > Key: MESOS-4527 > URL: https://issues.apache.org/jira/browse/MESOS-4527 > Project: Mesos > Issue Type: Task > Components: allocation >Reporter: Michael Park >Assignee: Michael Park > Labels: mesosphere > > Similar to MESOS-4526, reserved resources should be accounted for in the > quota role sorter regardless of their allocation state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-8075) Add RWMutex to libprocess
[ https://issues.apache.org/jira/browse/MESOS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhitao Li reassigned MESOS-8075: Assignee: Zhitao Li > Add RWMutex to libprocess > - > > Key: MESOS-8075 > URL: https://issues.apache.org/jira/browse/MESOS-8075 > Project: Mesos > Issue Type: Task > Components: libprocess >Reporter: Zhitao Li >Assignee: Zhitao Li > > We want to add a new {{RWMutex}} similar to {{Mutex}}, which can provide > better concurrecy protection for mutual exclusive actions, but allow high > concurrency for actions which can be performed at the same time. > One use case is image garbage collection: the new API > {{provisioner::pruneImages}} needs to be mutually exclusive from > {{provisioner::provision}}, but multiple {{{provisioner::provision}} can > concurrently run safely. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7511) CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7511: --- Shepherd: Vinod Kone Sprint: Mesosphere Sprint 65 Story Points: 1 > CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky. > --- > > Key: MESOS-7511 > URL: https://issues.apache.org/jira/browse/MESOS-7511 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: CentOS 6 with SSL >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: containerizer, flaky-test, isolation, mesosphere > Attachments: ROOT_DynamicAddDelofCniConfig_failure_log_centos6.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-7511) CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky.
[ https://issues.apache.org/jira/browse/MESOS-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov reassigned MESOS-7511: -- Assignee: Alexander Rukletsov > CniIsolatorTest.ROOT_DynamicAddDelofCniConfig is flaky. > --- > > Key: MESOS-7511 > URL: https://issues.apache.org/jira/browse/MESOS-7511 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: CentOS 6 with SSL >Reporter: Alexander Rukletsov >Assignee: Alexander Rukletsov > Labels: containerizer, flaky-test, isolation, mesosphere > Attachments: ROOT_DynamicAddDelofCniConfig_failure_log_centos6.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (MESOS-6790) Wrong task started time in webui
[ https://issues.apache.org/jira/browse/MESOS-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benno Evers reassigned MESOS-6790: -- Assignee: Benno Evers (was: Tomasz Janiszewski) > Wrong task started time in webui > > > Key: MESOS-6790 > URL: https://issues.apache.org/jira/browse/MESOS-6790 > Project: Mesos > Issue Type: Bug > Components: webui >Reporter: haosdent >Assignee: Benno Evers > Labels: health-check, mesosphere, observability, webui > > Reported by [~janisz] > {quote} > Hi > When task has enabled Mesos healthcheck start time in UI can show wrong > time. This happens because UI assumes that first status is task started > [0]. This is not always true because Mesos keeps only recent tasks statuses > [1] so when healthcheck updates tasks status it can override task start > time displayed in webui. > Best > Tomek > [0] > https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L140 > [1] > https://github.com/apache/mesos/blob/f2adc8a95afda943f6a10e771aad64300da19047/src/common/protobuf_utils.cpp#L263-L265 > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-6790) Wrong task started time in webui
[ https://issues.apache.org/jira/browse/MESOS-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benno Evers updated MESOS-6790: --- Sprint: Mesosphere Sprint 65 > Wrong task started time in webui > > > Key: MESOS-6790 > URL: https://issues.apache.org/jira/browse/MESOS-6790 > Project: Mesos > Issue Type: Bug > Components: webui >Reporter: haosdent >Assignee: Benno Evers > Labels: health-check, mesosphere, observability, webui > > Reported by [~janisz] > {quote} > Hi > When task has enabled Mesos healthcheck start time in UI can show wrong > time. This happens because UI assumes that first status is task started > [0]. This is not always true because Mesos keeps only recent tasks statuses > [1] so when healthcheck updates tasks status it can override task start > time displayed in webui. > Best > Tomek > [0] > https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L140 > [1] > https://github.com/apache/mesos/blob/f2adc8a95afda943f6a10e771aad64300da19047/src/common/protobuf_utils.cpp#L263-L265 > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8075) Add RWMutex to libprocess
Zhitao Li created MESOS-8075: Summary: Add RWMutex to libprocess Key: MESOS-8075 URL: https://issues.apache.org/jira/browse/MESOS-8075 Project: Mesos Issue Type: Task Components: libprocess Reporter: Zhitao Li We want to add a new {{RWMutex}} similar to {{Mutex}}, which can provide better concurrecy protection for mutual exclusive actions, but allow high concurrency for actions which can be performed at the same time. One use case is image garbage collection: the new API {{provisioner::pruneImages}} needs to be mutually exclusive from {{provisioner::provision}}, but multiple {{{provisioner::provision}} can concurrently run safely. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7935) CMake build should fail immediately for in-source builds
[ https://issues.apache.org/jira/browse/MESOS-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200575#comment-16200575 ] Joseph Wu commented on MESOS-7935: -- The two options in Damien's suggestion are undocumented (by CMake), so they are considered unsafe for use. Feel free to post a review with your alternative. > CMake build should fail immediately for in-source builds > > > Key: MESOS-7935 > URL: https://issues.apache.org/jira/browse/MESOS-7935 > Project: Mesos > Issue Type: Improvement > Components: cmake > Environment: macOS 10.12 > GNU/Linux Debian Stretch >Reporter: Damien Gerard >Assignee: Nathan Jackson > Labels: build > > In-source builds are neither recommended or supported. It is simple enough > to add a check to fail the build immediately. > --- > In-source build of master branch was broken with: > {noformat} > cd /Users/damien.gerard/projects/acp/mesos/src && > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ > -DBUILD_FLAGS=\"\" -DBUILD_JAVA_JVM_LIBRARY=\"\" -DHAS_AUTHENTICATION=1 > -DLIBDIR=\"/usr/local/libmesos\" -DPICOJSON_USE_INT64 > -DPKGDATADIR=\"/usr/local/share/mesos\" > -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\" -DUSE_CMAKE_BUILD_CONFIG > -DUSE_STATIC_LIB -DVERSION=\"1.4.0\" -D__STDC_FORMAT_MACROS > -Dmesos_1_4_0_EXPORTS -I/Users/damien.gerard/projects/acp/mesos/include > -I/Users/damien.gerard/projects/acp/mesos/include/mesos > -I/Users/damien.gerard/projects/acp/mesos/src -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/protobuf-3.3.0/src/protobuf-3.3.0-lib/lib/include > -isystem /Users/damien.gerard/projects/acp/mesos/3rdparty/libprocess/include > -isystem /usr/local/opt/apr/libexec/include/apr-1 -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/boost-1.53.0/src/boost-1.53.0 > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/elfio-3.2/src/elfio-3.2 > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/glog-0.3.3/src/glog-0.3.3-lib/lib/include > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/nvml-352.79/src/nvml-352.79 > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/picojson-1.3.0/src/picojson-1.3.0 > -isystem /usr/local/include/subversion-1 -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/stout/include -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/http_parser-2.6.2/src/http_parser-2.6.2 > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/concurrentqueue-1.0.0-beta/src/concurrentqueue-1.0.0-beta > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/libev-4.22/src/libev-4.22 > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/include > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/zookeeper-3.4.8/src/zookeeper-3.4.8/src/c/generated > -isystem > /Users/damien.gerard/projects/acp/mesos/3rdparty/leveldb-1.19/src/leveldb-1.19/include > -std=c++11 -fPIC -o > CMakeFiles/mesos-1.4.0.dir/slave/containerizer/mesos/provisioner/backends/copy.cpp.o > -c > /Users/damien.gerard/projects/acp/mesos/src/slave/containerizer/mesos/provisioner/backends/copy.cpp > /Users/damien.gerard/projects/acp/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:132:46: > error: no member named 'fetcher' in namespace 'mesos::uri'; did you mean > 'Fetcher'? > TryuriFetcher = uri::fetcher::create(); > ~^~~ > Fetcher > /Users/damien.gerard/projects/acp/mesos/include/mesos/uri/fetcher.hpp:46:7: > note: 'Fetcher' declared here > class Fetcher > ^ > /Users/damien.gerard/projects/acp/mesos/src/slave/containerizer/mesos/provisioner/appc/store.cpp:132:55: > error: no member named 'create' in 'mesos::uri::Fetcher' > Try uriFetcher = uri::fetcher::create(); > {noformat} > Both Linux & macOS, not tested elsewhere, on {{master}} and tag 1.4.0-rc3 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8005) Mesos.SlaveTest.ShutdownUnregisteredExecutor is flaky
[ https://issues.apache.org/jira/browse/MESOS-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200383#comment-16200383 ] Andrei Budnik commented on MESOS-8005: -- {code} [ RUN ] SlaveTest.ShutdownUnregisteredExecutor I0922 00:38:40.364121 31018 cluster.cpp:162] Creating default 'local' authorizer I0922 00:38:40.365996 31034 master.cpp:445] Master 83bd1613-70d9-4c3e-b490-4aa60dd26e22 (ip-172-16-10-25) started on 172.16.10.25:44747 I0922 00:38:40.366019 31034 master.cpp:447] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/u6YBLG/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/u6YBLG/master" --zk_session_timeout="10secs" I0922 00:38:40.366137 31034 master.cpp:497] Master only allowing authenticated frameworks to register I0922 00:38:40.366145 31034 master.cpp:511] Master only allowing authenticated agents to register I0922 00:38:40.366150 31034 master.cpp:524] Master only allowing authenticated HTTP frameworks to register I0922 00:38:40.366155 31034 credentials.hpp:37] Loading credentials for authentication from '/tmp/u6YBLG/credentials' I0922 00:38:40.366237 31034 master.cpp:569] Using default 'crammd5' authenticator I0922 00:38:40.366286 31034 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I0922 00:38:40.366349 31034 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I0922 00:38:40.366389 31034 http.cpp:1045] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I0922 00:38:40.366443 31034 master.cpp:649] Authorization enabled I0922 00:38:40.366475 31039 hierarchical.cpp:171] Initialized hierarchical allocator process I0922 00:38:40.366564 31038 whitelist_watcher.cpp:77] No whitelist given I0922 00:38:40.367216 31036 master.cpp:2166] Elected as the leading master! I0922 00:38:40.367238 31036 master.cpp:1705] Recovering from registrar I0922 00:38:40.367282 31036 registrar.cpp:347] Recovering registrar I0922 00:38:40.367449 31036 registrar.cpp:391] Successfully fetched the registry (0B) in 150016ns I0922 00:38:40.367483 31036 registrar.cpp:495] Applied 1 operations in 5392ns; attempting to update the registry I0922 00:38:40.367624 31034 registrar.cpp:552] Successfully updated the registry in 119808ns I0922 00:38:40.367697 31034 registrar.cpp:424] Successfully recovered registrar I0922 00:38:40.367858 31036 hierarchical.cpp:209] Skipping recovery of hierarchical allocator: nothing to recover I0922 00:38:40.367869 31037 master.cpp:1804] Recovered 0 agents from the registry (142B); allowing 10mins for agents to re-register I0922 00:38:40.368898 31018 containerizer.cpp:292] Using isolation { environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni } I0922 00:38:40.372519 31018 linux_launcher.cpp:146] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I0922 00:38:40.372859 31018 provisioner.cpp:255] Using default backend 'overlay' W0922 00:38:40.375388 31018 process.cpp:3194] Attempted to spawn already running process files@172.16.10.25:44747 I0922 00:38:40.375486 31018 cluster.cpp:448] Creating default 'local' authorizer I0922 00:38:40.375942 31036 slave.cpp:254] Mesos agent started on (531)@172.16.10.25:44747 W0922 00:38:40.376080 31018 process.cpp:3194] Attempted to spawn already running process version@172.16.10.25:44747 I0922 00:38:40.375958 31036 slave.cpp:255] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://; --appc_store_dir="/tmp/SlaveTest_ShutdownUnregisteredExecutor_mhaf10/store/appc" --authenticate_http_executors="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local"
[jira] [Updated] (MESOS-8072) Change Mesos common events verbose logs to use VLOG(2) instead of 1
[ https://issues.apache.org/jira/browse/MESOS-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armand Grillet updated MESOS-8072: -- Sprint: Mesosphere Sprint 65 > Change Mesos common events verbose logs to use VLOG(2) instead of 1 > --- > > Key: MESOS-8072 > URL: https://issues.apache.org/jira/browse/MESOS-8072 > Project: Mesos > Issue Type: Improvement >Reporter: Armand Grillet >Assignee: Armand Grillet >Priority: Minor > Labels: logging, mesosphere > > The original commit > https://github.com/apache/mesos/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7 > that started using VLOG(1) for the allocator does not state why this level > was chosen and the periodic messages such as "No allocations performed" > should be displayed at a higher level to simplify debugging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8074) Change Libprocess actor state transitions verbose logs to use VLOG(3) instead of 2
[ https://issues.apache.org/jira/browse/MESOS-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armand Grillet updated MESOS-8074: -- Sprint: Mesosphere Sprint 65 Labels: logging mesosphere (was: ) > Change Libprocess actor state transitions verbose logs to use VLOG(3) instead > of 2 > -- > > Key: MESOS-8074 > URL: https://issues.apache.org/jira/browse/MESOS-8074 > Project: Mesos > Issue Type: Improvement >Reporter: Armand Grillet >Assignee: Armand Grillet >Priority: Minor > Labels: logging, mesosphere > > Without claiming a general change or a holistic approach, the amount of logs > concerning states being resumed when running a Mesos cluster with > {{GLOG_v=2}} is quite noisy. We should thus use {{VLOG(3)}} for such messages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8072) Change Mesos common events verbose logs to use VLOG(2) instead of 1
[ https://issues.apache.org/jira/browse/MESOS-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armand Grillet updated MESOS-8072: -- Labels: logging mesosphere (was: ) > Change Mesos common events verbose logs to use VLOG(2) instead of 1 > --- > > Key: MESOS-8072 > URL: https://issues.apache.org/jira/browse/MESOS-8072 > Project: Mesos > Issue Type: Improvement >Reporter: Armand Grillet >Assignee: Armand Grillet >Priority: Minor > Labels: logging, mesosphere > > The original commit > https://github.com/apache/mesos/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7 > that started using VLOG(1) for the allocator does not state why this level > was chosen and the periodic messages such as "No allocations performed" > should be displayed at a higher level to simplify debugging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7506) Multiple tests leave orphan containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200341#comment-16200341 ] Andrei Budnik commented on MESOS-7506: -- I put a {{::sleep(2);}} after {{slave = this->StartSlave(detector.get(), containerizer.get(), flags);}} in [SlaveRecoveryTest.RecoverTerminatedExecutor|https://github.com/apache/mesos/blob/0908303142f641c1697547eb7f8e82a205d6c362/src/tests/slave_recovery_tests.cpp#L1634] and got: {code} ../../src/tests/slave_recovery_tests.cpp:1656: Failure Expected: TASK_LOST To be equal to: status->state() Which is: TASK_FAILED {code} > Multiple tests leave orphan containers. > --- > > Key: MESOS-7506 > URL: https://issues.apache.org/jira/browse/MESOS-7506 > Project: Mesos > Issue Type: Bug > Components: containerization > Environment: Ubuntu 16.04 > Fedora 23 > other Linux distros >Reporter: Alexander Rukletsov >Assignee: Andrei Budnik > Labels: containerizer, flaky-test, mesosphere > > I've observed a number of flaky tests that leave orphan containers upon > cleanup. A typical log looks like this: > {noformat} > ../../src/tests/cluster.cpp:580: Failure > Value of: containers->empty() > Actual: false > Expected: true > Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 } > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8072) Change Mesos common events verbose logs to use VLOG(2) instead of 1
[ https://issues.apache.org/jira/browse/MESOS-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armand Grillet updated MESOS-8072: -- Shepherd: Alexander Rukletsov > Change Mesos common events verbose logs to use VLOG(2) instead of 1 > --- > > Key: MESOS-8072 > URL: https://issues.apache.org/jira/browse/MESOS-8072 > Project: Mesos > Issue Type: Improvement >Reporter: Armand Grillet >Assignee: Armand Grillet >Priority: Minor > > The original commit > https://github.com/apache/mesos/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7 > that started using VLOG(1) for the allocator does not state why this level > was chosen and the periodic messages such as "No allocations performed" > should be displayed at a higher level to simplify debugging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-8074) Change Libprocess actor state transitions verbose logs to use VLOG(3) instead of 2
[ https://issues.apache.org/jira/browse/MESOS-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Armand Grillet updated MESOS-8074: -- Story Points: 1 Summary: Change Libprocess actor state transitions verbose logs to use VLOG(3) instead of 2 (was: Change Libprocess actor state transitions verbose logs to use VLOG(2) instead of 1) > Change Libprocess actor state transitions verbose logs to use VLOG(3) instead > of 2 > -- > > Key: MESOS-8074 > URL: https://issues.apache.org/jira/browse/MESOS-8074 > Project: Mesos > Issue Type: Improvement >Reporter: Armand Grillet >Assignee: Armand Grillet >Priority: Minor > > Without claiming a general change or a holistic approach, the amount of logs > concerning states being resumed when running a Mesos cluster with > {{GLOG_v=2}} is quite noisy. We should thus use {{VLOG(3)}} for such messages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8074) Change Libprocess actor state transitions verbose logs to use VLOG(2) instead of 1
Armand Grillet created MESOS-8074: - Summary: Change Libprocess actor state transitions verbose logs to use VLOG(2) instead of 1 Key: MESOS-8074 URL: https://issues.apache.org/jira/browse/MESOS-8074 Project: Mesos Issue Type: Improvement Reporter: Armand Grillet Assignee: Armand Grillet Priority: Minor Without claiming a general change or a holistic approach, the amount of logs concerning states being resumed when running a Mesos cluster with {{GLOG_v=2}} is quite noisy. We should thus use {{VLOG(3)}} for such messages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-2013) Slave read endpoint doesn't encode non-ascii characters correctly
[ https://issues.apache.org/jira/browse/MESOS-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200144#comment-16200144 ] Jan-Philip Gehrcke commented on MESOS-2013: --- What is the /file/read endpoint expected to emit, by design? Since it reads from a file I would expect it to emit a raw byte sequence, straight from that file. Why do we even talk about the concept of characters and text in this context? If the part of the Mesos code that reads the file contents makes any assumption about the file contents (such as that it contains text encoded via UTF-8) then of course these assumptions can easily be violated. > Slave read endpoint doesn't encode non-ascii characters correctly > - > > Key: MESOS-2013 > URL: https://issues.apache.org/jira/browse/MESOS-2013 > Project: Mesos > Issue Type: Bug > Components: json api >Reporter: Whitney Sorenson >Assignee: Anand Mazumdar > > Create a file in a sandbox with a non-ascii character, like this one: > http://www.fileformat.info/info/unicode/char/2018/index.htm > Hit the read endpoint for that file. > The response will have something like: > data: "\u00E2\u0080\u0098" > It should actually be: > data: "\u2018" > If you put either into JSON.parse() in the browser you will see the first > does not render correctly but the second does. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8073) Add per-framework metrics
[ https://issues.apache.org/jira/browse/MESOS-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200113#comment-16200113 ] ASF GitHub Bot commented on MESOS-8073: --- Github user janisz commented on the pull request: https://github.com/apache/mesos/commit/548aaee3a8f5935457767db1e3b761d873b09cbf#commitcomment-24904496 In src/webui/master/static/js/controllers.js: In src/webui/master/static/js/controllers.js on line 250: Created an issue for this [MESOS-8073](https://issues.apache.org/jira/browse/MESOS-8073) > Add per-framework metrics > - > > Key: MESOS-8073 > URL: https://issues.apache.org/jira/browse/MESOS-8073 > Project: Mesos > Issue Type: Improvement >Reporter: Tomasz Janiszewski >Priority: Minor > > Add per-framework metrics to the master so that the webui does not need to > loop over all tasks! > https://github.com/apache/mesos/commit/548aaee3a8f5935457767db1e3b761d873b09cbf#diff-9f2e9a08332888bca98d111787b3a8c3R249 > Refs: MESOS-7962 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8073) Add per-framework metrics
Tomasz Janiszewski created MESOS-8073: - Summary: Add per-framework metrics Key: MESOS-8073 URL: https://issues.apache.org/jira/browse/MESOS-8073 Project: Mesos Issue Type: Improvement Reporter: Tomasz Janiszewski Priority: Minor Add per-framework metrics to the master so that the webui does not need to loop over all tasks! https://github.com/apache/mesos/commit/548aaee3a8f5935457767db1e3b761d873b09cbf#diff-9f2e9a08332888bca98d111787b3a8c3R249 Refs: MESOS-7962 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (MESOS-8072) Change Mesos common events verbose logs to use VLOG(2) instead of 1
Armand Grillet created MESOS-8072: - Summary: Change Mesos common events verbose logs to use VLOG(2) instead of 1 Key: MESOS-8072 URL: https://issues.apache.org/jira/browse/MESOS-8072 Project: Mesos Issue Type: Improvement Reporter: Armand Grillet Assignee: Armand Grillet Priority: Minor The original commit https://github.com/apache/mesos/commit/fa6ffdfcd22136c171b43aed2e7949a07fd263d7 that started using VLOG(1) for the allocator does not state why this level was chosen and the periodic messages such as "No allocations performed" should be displayed at a higher level to simplify debugging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)