[GitHub] helix pull request #275: PR
GitHub user narendly opened a pull request: https://github.com/apache/helix/pull/275 PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/narendly/helix master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/275.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #275 commit e7b960c22896c08337292d20f674f20a7f1391d0 Author: Hunter Lee Date: 2018-10-27T01:32:16Z [HELIX-762] TASK: Change LOG mode from info to debug In production, it was observed that some users were running thousands of tasks, and since AssignableInstance leaves a line of log for each task assigned or released, the amount of log that was being generated was too much, and it was too verbose. Changelist: 1. Change the logging mode from info to debug in AssignableInstance and AssignableInstanceManager commit e492d9f663d8edad0f344208cc8affc6828708a3 Author: Hunter Lee Date: 2018-10-27T01:49:52Z [HELIX-763] Task:Ignore tasks whose workflow and job are inactive It was discovered that by manual testing, there were task states in INIT and RUNNING, and they were occupying a thread count even though their parent job or workflow was in an inactive state (terminal or stopped). This was happening when the capacities were being rebuilt from scratch, which could have caused a thread leak. Changelist: 1. Add a check in buildAssignableInstances() so that it ignores workflows and jobs whose states are inactive states (that is, their tasks cannot be occupying a thread on Participants) commit d33d9efea25fe9d29e84a4ce7614b544ef2d Author: Hunter Lee Date: 2018-10-27T02:03:47Z [HELIX-764] TASK: Fix LiveInstanceCurrentState change flag Previously, existsLiveInstanceOrCurrentStateChange was getting reset in ClusterDataCache when its getter was called. This was problematic because if there were multiple jobs or multiple workflows, whoever calls this getter would get the correct flag value, and the ensuing callers would get a false because the flag would have been reset. This RB fixes that bug by reseting the flat right in the beginning of refresh() call in ClusterDataCache, which allows all callers during that pipeline would get the same, correct value. Changelist: 1. Change the getter so that it does not reset the flag; instead, reset the flag in the beginning of refresh() commit 930a4b7ae7eb63be0a751a593ba630ae55fb2cfb Author: Hunter Lee Date: 2018-10-27T02:06:42Z [HELIX-765] TASK: Build quota profile from scratch every rebalance It has been reported that instances have a full quota despite no tasks existing in their CURRENTSTATES. The cause of this is not clear, so making ClusterDataCache trigger a refresh of all AssignableInstances will ensure that there aren't situations where it looks like there has been a thread leak. Optimizations will be implemented if necessary. Changelist: 1. Make AssignableInstanceManager build all AssignableInstances from scratch every rebalance commit 5033785c231af363953367f65f77513911b753f5 Author: Hunter Lee Date: 2018-10-27T02:08:02Z [HELIX-766] TASK: Add logging functionality in AssignableInstanceManager In order to debug task-related inquiries and issues, we realized that it would be very helpful if we logged there was a log recording the current quota capacity of all AssignableInstances. This is for cases where we see jobs whose tasks are not getting assigned so that we could quickly rule out the possibility of bugs in quota-based scheduling. Changelist: 1. Add a method that logs current quota profile in a JSON format with an option flag of only displaying when there are quota types whose capacities are full 2. Add info logs in AssignableInstanceManager ---
[jira] [Created] (HELIX-766) [TASK] Add logging functionality in AssignableInstanceManager
Hunter L created HELIX-766: -- Summary: [TASK] Add logging functionality in AssignableInstanceManager Key: HELIX-766 URL: https://issues.apache.org/jira/browse/HELIX-766 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L In order to debug task-related inquiries and issues, we realized that it would be very helpful if we logged there was a log recording the current quota capacity of all AssignableInstances. This is for cases where we see jobs whose tasks are not getting assigned so that we could quickly rule out the possibility of bugs in quota-based scheduling. Changelist: 1. Add a method that logs current quota profile in a JSON format with an option flag of only displaying when there are quota types whose capacities are full 2. Add info logs in AssignableInstanceManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-765) [TASK] Build quota profile from scratch every rebalance
Hunter L created HELIX-765: -- Summary: [TASK] Build quota profile from scratch every rebalance Key: HELIX-765 URL: https://issues.apache.org/jira/browse/HELIX-765 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L It has been reported that instances have a full quota despite no tasks existing in their CURRENTSTATES. The cause of this is not clear, so making ClusterDataCache trigger a refresh of all AssignableInstances will ensure that there aren't situations where it looks like there has been a thread leak. Optimizations will be implemented if necessary. Changelist: 1. Make AssignableInstanceManager build all AssignableInstances from scratch every rebalance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-764) [TASK] Fix LiveInstanceCurrentState change flag
Hunter L created HELIX-764: -- Summary: [TASK] Fix LiveInstanceCurrentState change flag Key: HELIX-764 URL: https://issues.apache.org/jira/browse/HELIX-764 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L Previously, existsLiveInstanceOrCurrentStateChange was getting reset in ClusterDataCache when its getter was called. This was problematic because if there were multiple jobs or multiple workflows, whoever calls this getter would get the correct flag value, and the ensuing callers would get a false because the flag would have been reset. This RB fixes that bug by reseting the flat right in the beginning of refresh() call in ClusterDataCache, which allows all callers during that pipeline would get the same, correct value. Changelist: 1. Change the getter so that it does not reset the flag; instead, reset the flag in the beginning of refresh() -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-763) [TASK] Ignore tasks whose workflow and job are inactive
Hunter L created HELIX-763: -- Summary: [TASK] Ignore tasks whose workflow and job are inactive Key: HELIX-763 URL: https://issues.apache.org/jira/browse/HELIX-763 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L It was discovered that by manual testing, there were task states in INIT and RUNNING, and they were occupying a thread count even though their parent job or workflow was in an inactive state (terminal or stopped). This was happening when the capacities were being rebuilt from scratch, which could have caused a thread leak. Changelist: 1. Add a check in buildAssignableInstances() so that it ignores workflows and jobs whose states are inactive states (that is, their tasks cannot be occupying a thread on Participants) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HELIX-762) [TASK] Change LOG mode from info to debug
Hunter L created HELIX-762: -- Summary: [TASK] Change LOG mode from info to debug Key: HELIX-762 URL: https://issues.apache.org/jira/browse/HELIX-762 Project: Apache Helix Issue Type: Improvement Reporter: Hunter L Assignee: Hunter L In production, it was observed that some users were running thousands of tasks, and since AssignableInstance leaves a line of log for each task assigned or released, the amount of log that was being generated was too much, and it was too verbose. Changelist: 1. Change the logging mode from info to debug in AssignableInstance and AssignableInstanceManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HELIX-756) TASK: Change LOG mode from info to debug
[ https://issues.apache.org/jira/browse/HELIX-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665820#comment-16665820 ] ASF GitHub Bot commented on HELIX-756: -- Github user narendly closed the pull request at: https://github.com/apache/helix/pull/271 > TASK: Change LOG mode from info to debug > > > Key: HELIX-756 > URL: https://issues.apache.org/jira/browse/HELIX-756 > Project: Apache Helix > Issue Type: Improvement >Reporter: Hunter L >Assignee: Hunter L >Priority: Major > > In production, it was observed that some users were running thousands of > tasks, and since AssignableInstance leaves a line of log for each task > assigned or released, the amount of log that was being generated was too > much, and it was too verbose. > Changelist: > 1. Change the logging mode from info to debug in AssignableInstance and > AssignableInstanceManager -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] helix pull request #271: [HELIX-756] TASK: Change LOG mode from info to debu...
Github user narendly closed the pull request at: https://github.com/apache/helix/pull/271 ---