[jira] [Created] (HELIX-750) Implement Helix View Aggregator Service
Harry Zhang created HELIX-750: - Summary: Implement Helix View Aggregator Service Key: HELIX-750 URL: https://issues.apache.org/jira/browse/HELIX-750 Project: Apache Helix Issue Type: Task Reporter: Harry Zhang Assignee: Harry Zhang I have sent out the design for Helix view aggregation service to serve cross data center information in a more convenient way ([https://github.com/apache/helix/pull/266)] This ticket is to implement the service based on approved design. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-775) Task driver should support add/get task framework user content
Harry Zhang created HELIX-775: - Summary: Task driver should support add/get task framework user content Key: HELIX-775 URL: https://issues.apache.org/jira/browse/HELIX-775 Project: Apache Helix Issue Type: Task Reporter: Harry Zhang Assignee: Harry Zhang Task driver should support add/get task framework user content at workflow/job/task levels AC: * finish implementation * add tests -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-772) Support TaskDriver.addUserContent() api
Harry Zhang created HELIX-772: - Summary: Support TaskDriver.addUserContent() api Key: HELIX-772 URL: https://issues.apache.org/jira/browse/HELIX-772 Project: Apache Helix Issue Type: Bug Reporter: Harry Zhang Assignee: Harry Zhang Need to support add user content in task driver AC: * implement APi * add test -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-771) More detailed top state handoff metrics
Harry Zhang created HELIX-771: - Summary: More detailed top state handoff metrics Key: HELIX-771 URL: https://issues.apache.org/jira/browse/HELIX-771 Project: Apache Helix Issue Type: Bug Components: helix-core Reporter: Harry Zhang Assignee: Harry Zhang To define top state handoff SLA, we need some more detailed data: * graceful top state handoff (i.e. disable instance / resource / etc, both Helix and e2e latency) * abrupt top state handoff (i.e. node crash) AC: - prepare metrics, test, code complete -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-781) Implement Helix cluster view aggregator
Harry Zhang created HELIX-781: - Summary: Implement Helix cluster view aggregator Key: HELIX-781 URL: https://issues.apache.org/jira/browse/HELIX-781 Project: Apache Helix Issue Type: Task Reporter: Harry Zhang Assignee: Harry Zhang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-779) Maintenance rebalancer should not clear preference list in ideal state
Harry Zhang created HELIX-779: - Summary: Maintenance rebalancer should not clear preference list in ideal state Key: HELIX-779 URL: https://issues.apache.org/jira/browse/HELIX-779 Project: Apache Helix Issue Type: Bug Components: helix-core Reporter: Harry Zhang Assignee: Harry Zhang Setting list fields to empty map will prevent newly added and initially rebalanced resources during maintenance mode from getting re-balanced after cluster exists maintenance mode. The right thing to do is to clear every preference list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-780) Support get/add rest api for workflow/job/task user content
Harry Zhang created HELIX-780: - Summary: Support get/add rest api for workflow/job/task user content Key: HELIX-780 URL: https://issues.apache.org/jira/browse/HELIX-780 Project: Apache Helix Issue Type: Task Reporter: Harry Zhang Assignee: Harry Zhang Need to support get/add rest api for workflow/job/task user content AC: * finish implementation * test code -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-753) record top state handoff finished in single cluster data cache refresh
Harry Zhang created HELIX-753: - Summary: record top state handoff finished in single cluster data cache refresh Key: HELIX-753 URL: https://issues.apache.org/jira/browse/HELIX-753 Project: Apache Helix Issue Type: Bug Reporter: Harry Zhang Assignee: Harry Zhang Currently we are calculating top state handoff duration by doing the following: - record missing top state when we see a top state missing - record top state come back when we see it come back - report top state handoff duration This is perfectly fine for non-P2P state transitions as the entire top state handoff process will always finish for >= 2 pipeline runs. However, for P2P enabled clusters, top state handoff are quick, and if it is quicker than cluster data refresh stage latency, we will lose a lot of short top state handoffs, which make the number miserable on ingraph. We need to revise top state handoff metrics implementation so we don't lose data point statistically (i.e. we are losing all short handoffs now). AC: - revise impl so we catch those short top state hand-offs - write new tests to catch the fix if needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HELIX-753) Record top state handoff finished in single cluster data cache refresh
[ https://issues.apache.org/jira/browse/HELIX-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harry Zhang updated HELIX-753: -- Summary: Record top state handoff finished in single cluster data cache refresh (was: record top state handoff finished in single cluster data cache refresh) > Record top state handoff finished in single cluster data cache refresh > -- > > Key: HELIX-753 > URL: https://issues.apache.org/jira/browse/HELIX-753 > Project: Apache Helix > Issue Type: Bug >Reporter: Harry Zhang >Assignee: Harry Zhang >Priority: Major > > Currently we are calculating top state handoff duration by doing the > following: > - record missing top state when we see a top state missing > - record top state come back when we see it come back > - report top state handoff duration > This is perfectly fine for non-P2P state transitions as the entire top state > handoff process will always finish for >= 2 pipeline runs. However, for P2P > enabled clusters, top state handoff are quick, and if it is quicker than > cluster data refresh stage latency, we will lose a lot of short top state > handoffs, which make the number miserable on ingraph. > We need to revise top state handoff metrics implementation so we don't lose > data point statistically (i.e. we are losing all short handoffs now). > AC: > - revise impl so we catch those short top state hand-offs > - write new tests to catch the fix if needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)