NealSun96 commented on a change in pull request #1076:
URL: https://github.com/apache/helix/pull/1076#discussion_r437766166



##########
File path: 
helix-core/src/main/java/org/apache/helix/controller/stages/TaskGarbageCollectionStage.java
##########
@@ -23,34 +27,53 @@ public AsyncWorkerType getAsyncWorkerType() {
   }
 
   @Override
-  public void execute(ClusterEvent event) {
+  public void process(ClusterEvent event) throws Exception {
     WorkflowControllerDataProvider dataProvider =
         event.getAttribute(AttributeName.ControllerDataProvider.name());
+    event.addAttribute(AttributeName.WORKFLOW_CONFIG_MAP.name(),
+        dataProvider.getWorkflowConfigMap());

Review comment:
       I understand your proposal better now, and after looking at it 
carefully, we can ditch the Zk reads and preprocess lightweight logic (or keep 
the Zk reads in async thread, which is also okay). In the preprocess section of 
the stage, we can loop through all WorkflowContext and figure out 1. what 
workflows need to be garbage collected, and 2. what jobs need to be purged. All 
work can be done based on cache.
   
   For your point 1 you did mention that we can do the calculation in 
"rebalance pipeline" (I assume is the synonym for TaskSchedulingStage"). I 
prefer doing the calculation in GarbageCollectionStage for code clarity, and 
the overhead isn't much because it's 1 loop through workflows using in-memory 
operations. What do you think? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to