[ https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670955#comment-16670955 ]
Hudson commented on HELIX-778: ------------------------------ FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/]) [HELIX-778] TASK: Fix a race condition in (hulee: rev ceba1a55ae351090144c001324f908f2364212a4) * (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java * (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java > TASK: Fix a race condition in updatePreviousAssignedTasksStatus > --------------------------------------------------------------- > > Key: HELIX-778 > URL: https://issues.apache.org/jira/browse/HELIX-778 > Project: Apache Helix > Issue Type: Improvement > Reporter: Hunter L > Assignee: Hunter L > Priority: Major > > It was observed that TestUnregisteredCommand is very unstable. The reason was > identified to be a race condition where when a task fails, sometimes a > pending message for that task (from INIT to RUNNING) wasn't being cleaned up > on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would > try to process that message and skip the status update of that task (like > updating its status and NUM_ATTEMPTS field in JobContext). > A short, temporary fix is to call markPartitionError() prior to checking the > pending message, but over the long haul, we would need to revisit the task > status update's design here to avoid this type of race conditions. > Changelist: > 1. Move markPartitionError() up before checking for a pending message on the > task > 2. Fix TestUnregisteredCommand's instability -- This message was sent by Atlassian JIRA (v7.6.3#76005)