NealSun96 commented on a change in pull request #1076:
URL: https://github.com/apache/helix/pull/1076#discussion_r437737196
##########
File path:
helix-core/src/main/java/org/apache/helix/controller/stages/TaskGarbageCollectionStage.java
##########
@@ -23,34 +27,53 @@ public AsyncWorkerType getAsyncWorkerType() {
}
@Override
- public void execute(ClusterEvent event) {
+ public void process(ClusterEvent event) throws Exception {
WorkflowControllerDataProvider dataProvider =
event.getAttribute(AttributeName.ControllerDataProvider.name());
+ event.addAttribute(AttributeName.WORKFLOW_CONFIG_MAP.name(),
+ dataProvider.getWorkflowConfigMap());
+ event.addAttribute(AttributeName.RESOURCE_CONTEXT_MAP.name(),
dataProvider.getContexts());
Review comment:
This is a very good point for reviewing. First, my mistake: this should
be a deep copy. Now, with the concerns raised here, we have a couple options:
1. We still do deep copies. I'm only copying the maps that are necessary for
this stage to run, which is the 2 maps for WorkflowConfig and Context.
Pro: Doesn't change existing logic
Con: Uses more memory. @narendly I'd like to know how concerned are you
about the memory usage for them?
2. We do what @jiajunwang suggests; we only pass in the ones that need to be
GC'ed.
Pro: Less memory usage.
Con: More time spent in the main thread. It's actually not a fast thing to
do because `getExpiredJobs()` contains Zk read operations. This could add delay
to the main pipeline.
3. @dasahcc proposed another approach: the event is like a trigger, and the
garbage collection stage figures out what to GC by itself. The stage performs 2
Zk reads to get WorkflowConfigs and WorkflowContexts. Imo 2 Zk reads are not
many comparing to the existing Zk reads this stage does (per job level).
Feel free to let me know what you guys think.
@jiajunwang About the test comment: I agree with what you said 100% - there
should have been a test. The race condition was introduced due to cache
refresh, but whatever approach we take, this stage will now be completely
unrelated to the cache. Should we still write a test that involves cache
refresh()? I don't think it would make sense NOW because this stage is no
longer related to it; it would make sense before.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]