alirezazamani commented on a change in pull request #741: Fix ConcurrentModification exception in Workflow Garbage Collection URL: https://github.com/apache/helix/pull/741#discussion_r377433026
########## File path: helix-core/src/main/java/org/apache/helix/task/TaskUtil.java ########## @@ -1043,23 +1043,33 @@ public static void purgeExpiredJobs(String workflow, WorkflowConfig workflowConf * @param dataProvider * @param manager */ - public static void workflowGarbageCollection(WorkflowControllerDataProvider dataProvider, + public static void workflowGarbageCollection(final WorkflowControllerDataProvider dataProvider, final HelixManager manager) { // Garbage collections for conditions where workflow context exists but config is missing. - Map<String, ZNRecord> contexts = dataProvider.getContexts(); - HelixDataAccessor accessor = manager.getHelixDataAccessor(); - HelixPropertyStore<ZNRecord> propertyStore = manager.getHelixPropertyStore(); + // toBeDeletedWorkflows is a set that contains the name of the workflows that their contexts + // should be deleted. Set<String> toBeDeletedWorkflows = new HashSet<>(); - for (Map.Entry<String, ZNRecord> entry : contexts.entrySet()) { - if (entry.getValue() != null - && entry.getValue().getId().equals(TaskUtil.WORKFLOW_CONTEXT_KW)) { - if (dataProvider.getWorkflowConfig(entry.getKey()) == null) { - toBeDeletedWorkflows.add(entry.getKey()); + try { + Set<String> existingWorkflowContexts = new HashSet<>(dataProvider.getContexts().keySet()); + for (String entry : existingWorkflowContexts) { + if (entry != null) { + WorkflowConfig cfg = dataProvider.getWorkflowConfig(entry); + WorkflowContext ctx = dataProvider.getWorkflowContext(entry); + if (ctx != null && ctx.getId().equals(TaskUtil.WORKFLOW_CONTEXT_KW) && cfg == null) { + toBeDeletedWorkflows.add(entry); + } } } + } catch (Exception e) { + LOG.warn( + "Exception occurred while creating a list of all existing contexts with missing config!", + e); } Review comment: Yes. I tried several scenarios and for each scenario I used Jiajun's scripts which runs the test for 50 time. The most effective solution is the one that I proposed in this PR. Please have a look at this line: - Set<String> existingWorkflowContexts = new HashSet<>(dataProvider.getContexts().keySet()); In some minor cases (about 2 out of 50 runs) above line is the only line that can generate concurrent modification exception which I eliminated it with try-catch. The reason behind this is because while we want to get all of the existing contexts, the contextMap can be modified in the cache by other threads. As a result we will get concurrentMod exception. Please note that this part of the code runs asynchronously. @narendly I don't have strong preference about this method and I would be happy if you can propose new way to get all of the context without hitting concurrent modification exception. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@helix.apache.org For additional commands, e-mail: reviews-h...@helix.apache.org