[jira] [Commented] (MESOS-10011) Operation feedback with stale agent ID crashes the master

2019-10-11 Thread Yan Xu (Jira)
[ https://issues.apache.org/jira/browse/MESOS-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949656#comment-16949656 ] Yan Xu commented on MESOS-10011: {{removeOperation}} is probably called from

[jira] [Commented] (MESOS-10011) Operation feedback with stale agent ID crashes the master

2019-10-10 Thread Yan Xu (Jira)
[ https://issues.apache.org/jira/browse/MESOS-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949147#comment-16949147 ] Yan Xu commented on MESOS-10011: To be most compatible with the agent checkpointed resources behavior

[jira] [Commented] (MESOS-10011) Operation feedback with stale agent ID crashes the master

2019-10-10 Thread Yan Xu (Jira)
[ https://issues.apache.org/jira/browse/MESOS-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949139#comment-16949139 ] Yan Xu commented on MESOS-10011: [~greggomann] any thoughts on how this should be addressed? >

[jira] [Commented] (MESOS-10011) Operation feedback with stale agent ID crashes the master

2019-10-10 Thread Yan Xu (Jira)
[ https://issues.apache.org/jira/browse/MESOS-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949069#comment-16949069 ] Yan Xu commented on MESOS-10011: In our environment we only use old style RESERVE/CREATE persistent

[jira] [Created] (MESOS-10011) Operation feedback with stale agent ID crashes the master

2019-10-10 Thread Yan Xu (Jira)
Yan Xu created MESOS-10011: -- Summary: Operation feedback with stale agent ID crashes the master Key: MESOS-10011 URL: https://issues.apache.org/jira/browse/MESOS-10011 Project: Mesos Issue Type:

[jira] [Commented] (MESOS-9768) Allow operators to mount the container rootfs with the `nosuid` flag

2019-05-17 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842819#comment-16842819 ] Yan Xu commented on MESOS-9768: --- What we are primarily interested in is to set it for for the {{overlay}} 

[jira] [Commented] (MESOS-9368) The agent can be resending status updates too aggressively and the backoff is not configurable

2018-11-02 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673503#comment-16673503 ] Yan Xu commented on MESOS-9368: --- cc [~fiu] > The agent can be resending status updates too aggressively

[jira] [Commented] (MESOS-9368) The agent can be resending status updates too aggressively and the backoff is not configurable

2018-11-02 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673497#comment-16673497 ] Yan Xu commented on MESOS-9368: --- [~ipronin] [~jasonlai] do you guys feel similarly for your environments?

[jira] [Created] (MESOS-9368) The agent can be resending status updates too aggressively and the backoff is not configurable

2018-11-02 Thread Yan Xu (JIRA)
Yan Xu created MESOS-9368: - Summary: The agent can be resending status updates too aggressively and the backoff is not configurable Key: MESOS-9368 URL: https://issues.apache.org/jira/browse/MESOS-9368

[jira] [Commented] (MESOS-9178) Add a metric for master failover time.

2018-09-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612870#comment-16612870 ] Yan Xu commented on MESOS-9178: --- So my proposal is that, we have the following metrics:..

[jira] [Commented] (MESOS-9178) Add a metric for master failover time.

2018-08-22 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589463#comment-16589463 ] Yan Xu commented on MESOS-9178: --- +1. Yup that's the approach we talked about. Sorry the JIRA didn't mention

[jira] [Created] (MESOS-9171) Mesos agent crashes

2018-08-21 Thread Yan Xu (JIRA)
Yan Xu created MESOS-9171: - Summary: Mesos agent crashes Key: MESOS-9171 URL: https://issues.apache.org/jira/browse/MESOS-9171 Project: Mesos Issue Type: Bug Affects Versions: 1.7.0

[jira] [Commented] (MESOS-8897) ROOT_XFS_QuotaTest.DiskUsageExceedsQuotaWithKill is flaky

2018-05-09 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469112#comment-16469112 ] Yan Xu commented on MESOS-8897: --- cc [~hdost] > ROOT_XFS_QuotaTest.DiskUsageExceedsQuotaWithKill is flaky >

[jira] [Created] (MESOS-8897) ROOT_XFS_QuotaTest.DiskUsageExceedsQuotaWithKill is flaky

2018-05-09 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8897: - Summary: ROOT_XFS_QuotaTest.DiskUsageExceedsQuotaWithKill is flaky Key: MESOS-8897 URL: https://issues.apache.org/jira/browse/MESOS-8897 Project: Mesos Issue Type: Bug

[jira] [Commented] (MESOS-8750) Check failed: !slaves.registered.contains(task->slave_id)

2018-05-03 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463379#comment-16463379 ] Yan Xu commented on MESOS-8750: --- {code:title=} commit 520b729857223aeade345cbdf61209ec4f395ad9 Author: Megha

[jira] [Assigned] (MESOS-8630) All subsequent registry operations fail after the registrar is aborted after a failed update

2018-05-01 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8630: - Assignee: Xudong Ni > All subsequent registry operations fail after the registrar is aborted after > a

[jira] [Commented] (MESOS-8618) ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.

2018-04-30 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459442#comment-16459442 ] Yan Xu commented on MESOS-8618: --- {noformat:title=} commit 1c6d9e5e6d7439444c77d6c91b18642f69557dfe Author:

[jira] [Created] (MESOS-8855) Change TaskStatus.Reason's default value to something

2018-04-30 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8855: - Summary: Change TaskStatus.Reason's default value to something Key: MESOS-8855 URL: https://issues.apache.org/jira/browse/MESOS-8855 Project: Mesos Issue Type: Bug

[jira] [Assigned] (MESOS-8824) Send the task's latest "status update state" to frameworks when an unreachable agent reregisters.

2018-04-23 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8824: - Assignee: Yan Xu > Send the task's latest "status update state" to frameworks when an > unreachable

[jira] [Commented] (MESOS-8618) ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.

2018-04-23 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448971#comment-16448971 ] Yan Xu commented on MESOS-8618: --- This test failed because we didn't enable replicated log registry so the

[jira] [Created] (MESOS-8824) Send the task's latest "status update state" to frameworks when an unreachable agent reregisters.

2018-04-23 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8824: - Summary: Send the task's latest "status update state" to frameworks when an unreachable agent reregisters. Key: MESOS-8824 URL: https://issues.apache.org/jira/browse/MESOS-8824

[jira] [Assigned] (MESOS-8618) ReconciliationTest.ReconcileStatusUpdateTaskState is flaky.

2018-04-23 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8618: - Assignee: Yan Xu > ReconciliationTest.ReconcileStatusUpdateTaskState is flaky. >

[jira] [Commented] (MESOS-8630) All subsequent registry operations fail after the registrar is aborted after a failed update

2018-04-06 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429105#comment-16429105 ] Yan Xu commented on MESOS-8630: --- A first step could be to identify all the places that updates the registry

[jira] [Commented] (MESOS-8636) Master should store `completed` frameworks for lifecycle enforcement separately from that for webUI and endpoints

2018-03-05 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386503#comment-16386503 ] Yan Xu commented on MESOS-8636: --- This is in line with what Mesos already does for [gone

[jira] [Created] (MESOS-8636) Master should store `completed` frameworks for lifecycle enforcement separately from that for webUI and endpoints

2018-03-05 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8636: - Summary: Master should store `completed` frameworks for lifecycle enforcement separately from that for webUI and endpoints Key: MESOS-8636 URL: https://issues.apache.org/jira/browse/MESOS-8636

[jira] [Commented] (MESOS-6422) cgroups_tests not correctly tearing down testing hierarchies

2018-03-02 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384307#comment-16384307 ] Yan Xu commented on MESOS-6422: --- Sorry this is low priority for me right now so I am unassigning. >

[jira] [Assigned] (MESOS-6422) cgroups_tests not correctly tearing down testing hierarchies

2018-03-02 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-6422: - Assignee: (was: Yan Xu) > cgroups_tests not correctly tearing down testing hierarchies >

[jira] [Created] (MESOS-8630) All subsequent registry operations fail after the registrar is aborted after a failed update

2018-03-02 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8630: - Summary: All subsequent registry operations fail after the registrar is aborted after a failed update Key: MESOS-8630 URL: https://issues.apache.org/jira/browse/MESOS-8630

[jira] [Created] (MESOS-8622) Agent should send a task status update when upon receiving the task

2018-02-27 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8622: - Summary: Agent should send a task status update when upon receiving the task Key: MESOS-8622 URL: https://issues.apache.org/jira/browse/MESOS-8622 Project: Mesos Issue

[jira] [Created] (MESOS-8602) Subscribers::send incorrectly assumes frameworks are registered

2018-02-22 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8602: - Summary: Subscribers::send incorrectly assumes frameworks are registered Key: MESOS-8602 URL: https://issues.apache.org/jira/browse/MESOS-8602 Project: Mesos Issue Type:

[jira] [Commented] (MESOS-8602) Subscribers::send incorrectly assumes frameworks are registered

2018-02-22 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373661#comment-16373661 ] Yan Xu commented on MESOS-8602: --- /cc [~greggomann] > Subscribers::send incorrectly assumes frameworks are

[jira] [Commented] (MESOS-8595) Mesos agent's use of /tmp for overlayfs could be confusing

2018-02-19 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369442#comment-16369442 ] Yan Xu commented on MESOS-8595: --- /cc [~gilbert] [~zhitao] > Mesos agent's use of /tmp for overlayfs could

[jira] [Created] (MESOS-8595) Mesos agent

2018-02-19 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8595: - Summary: Mesos agent Key: MESOS-8595 URL: https://issues.apache.org/jira/browse/MESOS-8595 Project: Mesos Issue Type: Bug Reporter: Yan Xu With MESOS-6000

[jira] [Created] (MESOS-8544) Required mesos.Task.state doesn't support upgrades.

2018-02-05 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8544: - Summary: Required mesos.Task.state doesn't support upgrades. Key: MESOS-8544 URL: https://issues.apache.org/jira/browse/MESOS-8544 Project: Mesos Issue Type: Bug

[jira] [Commented] (MESOS-8232) SlaveTest.RegisteredAgentReregisterAfterFailover is flaky.

2018-01-31 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347285#comment-16347285 ] Yan Xu commented on MESOS-8232: --- [~alexr] thanks a lot for diligently cleaning up flaky tests and filing

[jira] [Assigned] (MESOS-8232) SlaveTest.RegisteredAgentReregisterAfterFailover is flaky.

2018-01-31 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8232: - Assignee: Yan Xu > SlaveTest.RegisteredAgentReregisterAfterFailover is flaky. >

[jira] [Commented] (MESOS-8507) SLRP discards reservations when the agent is discarded, which could lead to leaked volumes.

2018-01-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344362#comment-16344362 ] Yan Xu commented on MESOS-8507: --- /cc [~chhsia0] [~jieyu] > SLRP discards reservations when the agent is

[jira] [Created] (MESOS-8507) SLRP discards reservations when the agent is discarded, which could lead to leaked volumes.

2018-01-29 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8507: - Summary: SLRP discards reservations when the agent is discarded, which could lead to leaked volumes. Key: MESOS-8507 URL: https://issues.apache.org/jira/browse/MESOS-8507 Project:

[jira] [Updated] (MESOS-8507) SLRP discards reservations when the agent is discarded, which could lead to leaked volumes.

2018-01-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8507: -- Description: In the current SLRP implementation the reservations for new SLRP/CSI backed volumes are

[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2018-01-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344190#comment-16344190 ] Yan Xu commented on MESOS-5368: --- [~vinodkone] It still seems to me that the proposal to tie the current

[jira] [Comment Edited] (MESOS-5368) Consider introducing persistent agent ID

2018-01-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344190#comment-16344190 ] Yan Xu edited comment on MESOS-5368 at 1/29/18 11:27 PM: - [~vinodkone] It still

[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.

2018-01-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324742#comment-16324742 ] Yan Xu commented on MESOS-8337: --- {noformat:title=} commit 35ac2f047abf2c0ea452b98a249c3dbb90d64282 (HEAD ->

[jira] [Commented] (MESOS-8125) Agent should properly handle recovering an executor when its pid is reused

2018-01-09 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319818#comment-16319818 ] Yan Xu commented on MESOS-8125: --- We used to not need to handle recovering executors after a reboot because

[jira] [Assigned] (MESOS-8334) PartitionedSlaveReregistrationMasterFailover is flaky.

2018-01-03 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8334: - Assignee: Yan Xu (was: Megha Sharma) > PartitionedSlaveReregistrationMasterFailover is flaky. >

[jira] [Commented] (MESOS-8334) PartitionedSlaveReregistrationMasterFailover is flaky.

2018-01-03 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310404#comment-16310404 ] Yan Xu commented on MESOS-8334: --- The agent reregistered before one scheduler hence the status update is

[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-12-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288018#comment-16288018 ] Yan Xu commented on MESOS-6406: --- {noformat:title=} commit 5e5a8102c3281db25a37157dac123b0ca546e030 (HEAD ->

[jira] [Commented] (MESOS-8306) Restrict which agents can statically reserve resources for which roles

2017-12-11 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286429#comment-16286429 ] Yan Xu commented on MESOS-8306: --- After investigating it I found that it makes more sense of reuse the

[jira] [Comment Edited] (MESOS-8306) Restrict which agents can statically reserve resources for which roles

2017-12-11 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286857#comment-16286857 ] Yan Xu edited comment on MESOS-8306 at 12/12/17 12:35 AM: --

[jira] [Commented] (MESOS-8306) Restrict which agents can statically reserve resources for which roles

2017-12-11 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286750#comment-16286750 ] Yan Xu commented on MESOS-8306: --- So in order to authorize the static reservations, the master would be

[jira] [Assigned] (MESOS-621) `HierarchicalAllocatorProcess::removeSlave` doesn't properly handle framework allocations/resources

2017-12-06 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-621: Assignee: (was: Yan Xu) > `HierarchicalAllocatorProcess::removeSlave` doesn't properly handle framework

[jira] [Assigned] (MESOS-8306) Restrict which agents can statically reserve resources for which roles

2017-12-06 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8306: - Assignee: Yan Xu > Restrict which agents can statically reserve resources for which roles >

[jira] [Created] (MESOS-8306) Restrict which agents can statically reserve resources for which roles

2017-12-06 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8306: - Summary: Restrict which agents can statically reserve resources for which roles Key: MESOS-8306 URL: https://issues.apache.org/jira/browse/MESOS-8306 Project: Mesos

[jira] [Commented] (MESOS-8223) Master crashes when suppressed on subscribe is enabled.

2017-12-01 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275169#comment-16275169 ] Yan Xu commented on MESOS-8223: --- {noformat:title=} commit 8c2f972b5c0c42e1519d09275cc26e1765a0c5de Author:

[jira] [Commented] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-12-01 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275165#comment-16275165 ] Yan Xu commented on MESOS-8200: --- {noformat:title=} commit 3711233fcec761be8625af6a028a228fe9d8dc5a Author:

[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271862#comment-16271862 ] Yan Xu commented on MESOS-6406: --- [~ipronin] no if the agent's entry was GCed. The master does know all the

[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271772#comment-16271772 ] Yan Xu commented on MESOS-6406: --- So I think we can probably improve on the approach stated in the JIRA: when

[jira] [Created] (MESOS-8276) Benchmark agent reregistration after master failover with connected frameworks.

2017-11-29 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8276: - Summary: Benchmark agent reregistration after master failover with connected frameworks. Key: MESOS-8276 URL: https://issues.apache.org/jira/browse/MESOS-8276 Project: Mesos

[jira] [Commented] (MESOS-8185) Tasks can be known to the agent but unknown to the master.

2017-11-27 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267891#comment-16267891 ] Yan Xu commented on MESOS-8185: --- [~ipronin] sure and Megha just submitted a RR for MESOS-6406. > Tasks can

[jira] [Updated] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-27 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-6406: -- Shepherd: Yan Xu (was: Vinod Kone) > Send latest status for partition-aware tasks when agent reregisters >

[jira] [Assigned] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-27 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-6406: - Assignee: Megha Sharma (was: Neil Conway) > Send latest status for partition-aware tasks when agent

[jira] [Commented] (MESOS-7711) Master updates registry for reregistering agents even when they haven't been unreachable

2017-11-22 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263224#comment-16263224 ] Yan Xu commented on MESOS-7711: --- Clarification on the fix: by not calling registrar in the mentioned

[jira] [Commented] (MESOS-8185) Tasks can be known to the agent but unknown to the master.

2017-11-17 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257398#comment-16257398 ] Yan Xu commented on MESOS-8185: --- I think so. [~ipronin] with MESOS-7215 no tasks will be killed by the

[jira] [Updated] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-15 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8200: -- Affects Version/s: 1.4.0 > Suppressed roles are not honoured for v1 scheduler subscribe requests. >

[jira] [Commented] (MESOS-8223) Master crashes when suppressed on subscribe is enabled.

2017-11-14 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252014#comment-16252014 ] Yan Xu commented on MESOS-8223: --- The problem is that this

[jira] [Assigned] (MESOS-8223) Master crashes when suppressed on subscribe is enabled.

2017-11-14 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8223: - Assignee: Yan Xu > Master crashes when suppressed on subscribe is enabled. >

[jira] [Created] (MESOS-8223) Master crashes when suppressed on subscribe is enabled.

2017-11-14 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8223: - Summary: Master crashes when suppressed on subscribe is enabled. Key: MESOS-8223 URL: https://issues.apache.org/jira/browse/MESOS-8223 Project: Mesos Issue Type: Bug

[jira] [Commented] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-13 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250233#comment-16250233 ] Yan Xu commented on MESOS-8200: --- [~vinodkone] yes. I have the patch for the devolve code ready but this

[jira] [Commented] (MESOS-8200) Suppressed roles are not honoured for v1 scheduler subscribe requests.

2017-11-10 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247947#comment-16247947 ] Yan Xu commented on MESOS-8200: --- The easies fix is probably to just change to tag to 3: {{repeated string

[jira] [Assigned] (MESOS-8178) UnreachableAgentReregisterAfterFailover is flaky.

2017-11-07 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8178: - Assignee: Yan Xu > UnreachableAgentReregisterAfterFailover is flaky. >

[jira] [Commented] (MESOS-8160) Support idempotent framework registration

2017-11-06 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241612#comment-16241612 ] Yan Xu commented on MESOS-8160: --- /cc [~adam-mesos] this is relevant to the point you made on MESOS-1719 "in

[jira] [Commented] (MESOS-8098) Benchmark Master failover performance

2017-11-03 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238131#comment-16238131 ] Yan Xu commented on MESOS-8098: --- {noformat:title=} commit ac0fa281472c2ba891f7bd0837fbd728ace73039 Author:

[jira] [Updated] (MESOS-8098) Benchmark Master failover performance

2017-11-03 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8098: -- Attachment: withoutperfpatches.perf.svg withperfpatches.perf.svg Attaching two flame graphs

[jira] [Created] (MESOS-8160) Support idempotent framework registration

2017-11-01 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8160: - Summary: Support idempotent framework registration Key: MESOS-8160 URL: https://issues.apache.org/jira/browse/MESOS-8160 Project: Mesos Issue Type: Bug

[jira] [Commented] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221487#comment-16221487 ] Yan Xu commented on MESOS-8138: --- {quote} the master realizes the disconnection when it tries to the pipe

[jira] [Updated] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8138: -- Description: What we've observed is that if the framework disconnects before the master actor processes the

[jira] [Commented] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221369#comment-16221369 ] Yan Xu commented on MESOS-8138: --- /cc [~anandmazumdar] who implemented MESOS-2294. > Master can fail to

[jira] [Created] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8138: - Summary: Master can fail to detect HTTP framework disconnection if it disconnects very fast Key: MESOS-8138 URL: https://issues.apache.org/jira/browse/MESOS-8138 Project: Mesos

[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-10-25 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219668#comment-16219668 ] Yan Xu commented on MESOS-5368: --- /cc [~anandmazumdar] does my comment above make sense? > Consider

[jira] [Assigned] (MESOS-8085) No point in deallocate() for a framework for maintenance if it is deactivated.

2017-10-16 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8085: - Assignee: Yan Xu > No point in deallocate() for a framework for maintenance if it is deactivated. >

[jira] [Created] (MESOS-8098) Benchmark Master failover performance

2017-10-16 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8098: - Summary: Benchmark Master failover performance Key: MESOS-8098 URL: https://issues.apache.org/jira/browse/MESOS-8098 Project: Mesos Issue Type: Task Components:

[jira] [Assigned] (MESOS-8098) Benchmark Master failover performance

2017-10-16 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8098: - Assignee: Yan Xu > Benchmark Master failover performance > - > >

[jira] [Updated] (MESOS-8085) No point in deallocate() for a framework for maintenance if it is deactivated.

2017-10-16 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8085: -- Description: The {{UnavailableResources}} sent from the allocator to the master are going to be dropped by the

[jira] [Updated] (MESOS-8085) No point in deallocate() for a framework for maintenance if it is deactivated.

2017-10-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8085: -- Summary: No point in deallocate() for a framework for maintenance if it is deactivated. (was: No point in

[jira] [Created] (MESOS-8085) No point in deallocate() for a framework for maintenance it is deactivated.

2017-10-12 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8085: - Summary: No point in deallocate() for a framework for maintenance it is deactivated. Key: MESOS-8085 URL: https://issues.apache.org/jira/browse/MESOS-8085 Project: Mesos

[jira] [Updated] (MESOS-8085) No point in deallocate() for a framework for maintenance it is deactivated.

2017-10-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8085: -- Labels: maintenance (was: ) > No point in deallocate() for a framework for maintenance it is deactivated. >

[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-10-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202636#comment-16202636 ] Yan Xu commented on MESOS-5368: --- Also, how does this relate to MESOS-8008? From there it sounds like the

[jira] [Created] (MESOS-8083) Mesos containerizer should run isolate() sequentially.

2017-10-12 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8083: - Summary: Mesos containerizer should run isolate() sequentially. Key: MESOS-8083 URL: https://issues.apache.org/jira/browse/MESOS-8083 Project: Mesos Issue Type:

[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2017-10-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202473#comment-16202473 ] Yan Xu commented on MESOS-5368: --- [~vinodkone] This sounds good to me, just a few details which I hope are

[jira] [Updated] (MESOS-8076) PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky.

2017-10-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8076: -- Shepherd: Alexander Rukletsov > PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky. >

[jira] [Assigned] (MESOS-8076) PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky.

2017-10-12 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu reassigned MESOS-8076: - Assignee: Yan Xu > PersistentVolumeTest.SharedPersistentVolumeRescindOnDestroy is flaky. >

[jira] [Updated] (MESOS-8062) Master sends messages to the agent before it reregisters

2017-10-09 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-8062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-8062: -- Component/s: master > Master sends messages to the agent before it reregisters >

[jira] [Created] (MESOS-8062) Master sends messages to the agent before it reregisters

2017-10-09 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8062: - Summary: Master sends messages to the agent before it reregisters Key: MESOS-8062 URL: https://issues.apache.org/jira/browse/MESOS-8062 Project: Mesos Issue Type: Bug

[jira] [Commented] (MESOS-6918) Prometheus exporter endpoints for metrics

2017-10-05 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194161#comment-16194161 ] Yan Xu commented on MESOS-6918: --- [~bmahler] let's chat about the reviews? [~jpe...@apache.org] and I have

[jira] [Commented] (MESOS-1280) Add replace task primitive

2017-10-02 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188488#comment-16188488 ] Yan Xu commented on MESOS-1280: --- Probably not all fields in the TaskInfo make equal sense to be updatable or

[jira] [Commented] (MESOS-7215) Race condition on re-registration of non-partition-aware frameworks

2017-09-29 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186286#comment-16186286 ] Yan Xu commented on MESOS-7215: --- [~megha.sharma] Per offline discussion, we should probably bundle what was

[jira] [Commented] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-26 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181817#comment-16181817 ] Yan Xu commented on MESOS-7964: --- {noformat:title=master} commit 06341309e61a5cee702ea3c7b6d3ef340ac95ad0

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-26 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-7964: -- Fix Version/s: 1.5.0 > Heavy-duty GC makes the agent unresponsive > --

[jira] [Updated] (MESOS-7964) Heavy-duty GC makes the agent unresponsive

2017-09-26 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Xu updated MESOS-7964: -- Affects Version/s: 1.4.0 > Heavy-duty GC makes the agent unresponsive >

[jira] [Commented] (MESOS-7921) process::EventQueue sometimes crashes

2017-09-06 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155706#comment-16155706 ] Yan Xu commented on MESOS-7921: --- Tried out the patch and it seemed to work. I had run mesos-tests with many

[jira] [Commented] (MESOS-7921) process::EventQueue sometimes crashes

2017-09-01 Thread Yan Xu (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151119#comment-16151119 ] Yan Xu commented on MESOS-7921: --- So libprocess GC would delete the managed process upon their exit:

  1   2   3   4   5   6   7   8   >