[ 
https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670955#comment-16670955
 ] 

Hudson commented on HELIX-778:
------------------------------

FAILURE: Integrated in Jenkins build helix #1561 (See 
[https://builds.apache.org/job/helix/1561/])
[HELIX-778] TASK: Fix a race condition in (hulee: rev 
ceba1a55ae351090144c001324f908f2364212a4)
* (edit) 
helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java
* (edit) 
helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java


> TASK: Fix a race condition in updatePreviousAssignedTasksStatus
> ---------------------------------------------------------------
>
>                 Key: HELIX-778
>                 URL: https://issues.apache.org/jira/browse/HELIX-778
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Hunter L
>            Assignee: Hunter L
>            Priority: Major
>
> It was observed that TestUnregisteredCommand is very unstable. The reason was 
> identified to be a race condition where when a task fails, sometimes a 
> pending message for that task (from INIT to RUNNING) wasn't being cleaned up 
> on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would 
> try to process that message and skip the status update of that task (like 
> updating its status and NUM_ATTEMPTS field in JobContext).
> A short, temporary fix is to call markPartitionError() prior to checking the 
> pending message, but over the long haul, we would need to revisit the task 
> status update's design here to avoid this type of race conditions.
> Changelist:
> 1. Move markPartitionError() up before checking for a pending message on the 
> task
> 2. Fix TestUnregisteredCommand's instability



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to