[GitHub] [helix] NealSun96 commented on a change in pull request #1076: Recover Workflow Garbage Collection Logic

GitBox Tue, 09 Jun 2020 14:38:49 -0700


NealSun96 commented on a change in pull request #1076:
URL: https://github.com/apache/helix/pull/1076#discussion_r437737196




##########
File path: 
helix-core/src/main/java/org/apache/helix/controller/stages/TaskGarbageCollectionStage.java
##########
@@ -23,34 +27,53 @@ public AsyncWorkerType getAsyncWorkerType() {
   }
 
   @Override
-  public void execute(ClusterEvent event) {
+  public void process(ClusterEvent event) throws Exception {
     WorkflowControllerDataProvider dataProvider =
         event.getAttribute(AttributeName.ControllerDataProvider.name());
+    event.addAttribute(AttributeName.WORKFLOW_CONFIG_MAP.name(),
+        dataProvider.getWorkflowConfigMap());
+    event.addAttribute(AttributeName.RESOURCE_CONTEXT_MAP.name(), 
dataProvider.getContexts());

Review comment:
       This is a very good point for reviewing. First, my mistake: this should 
be a deep copy. Now, with the concerns raised here, we have a couple options:
   
   1. We still do deep copies. I'm only copying the maps that are necessary for 
this stage to run, which is the 2 maps for WorkflowConfig and Context. 
   Pro: Doesn't change existing logic
   Con: Uses more memory. @narendly I'd like to know how concerned are you 
about the memory usage for them? 
   2. We do what @jiajunwang suggests; we only pass in the ones that need to be 
GC'ed. 
   Pro: Less memory usage.
   Con: More time spent in the main thread. It's actually not a fast thing to 
do because `getExpiredJobs()` contains Zk read operations. This could add delay 
to the main pipeline. 
   3. @dasahcc proposed another approach: the event is like a trigger, and the 
garbage collection stage figures out what to GC by itself. The stage performs 2 
Zk reads to get WorkflowConfigs and WorkflowContexts. Imo 2 Zk reads are not 
many comparing to the existing Zk reads this stage does (per job level). 
   
   Feel free to let me know what you guys think. 
   
   @jiajunwang About the test comment: I agree with what you said 100% - there 
should have been a test. The race condition was introduced due to cache 
refresh, but whatever approach we take, this stage will now be completely 
unrelated to the cache. Should we still write a test that involves cache 
refresh()? I don't think it would make sense NOW because this stage is no 
longer related to it; it would make sense before. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [helix] NealSun96 commented on a change in pull request #1076: Recover Workflow Garbage Collection Logic

Reply via email to