workflow action allow user auto retry
-------------------------------------

                 Key: OOZIE-10
                 URL: https://issues.apache.org/jira/browse/OOZIE-10
             Project: Apache Oozie (Incubating)
          Issue Type: New Feature
            Reporter: Angelo K. Huang
            Assignee: Angelo K. Huang


Workflow action only allows transient error retry currently. User often wants 
to control retry in each action level, such as define custom retry count for 
each action. For a FAILED action, the possible reason could be startData or 
endData not set or EL exception. The potential problem worth to retry is when 
Oozie not able to get running job with a hadoop id. For a ERROR action, most of 
errors come from job application error such as failed to parse action conf, 
buffer overflow in ssh executor, or file not existed in fs action executor.

The solution is to define 0.3 workflow schema with new attributes in action 
level to get user defined retry and to add default Oozie conf for system level 
max user-retry. EX:

workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.3" name="test-wf">
<action name="a" retry-max="2" retry-interval="1">

</action>

oozie-default.xml

   <!-- Workflow Action Automatic Retry -->

    <property>
        <name>oozie.service.LiteWorkflowStoreService.user.retry.max</name>
        <value>3</value>
        <description>
            Automatic retry max count for workflow action is 3 in default.
        </description>
    </property>
   
    <property>
        
<name>oozie.service.LiteWorkflowStoreService.user.retry.inteval</name>
        <value>10</value>
        <description>
            Automatic retry interval for workflow action is in minutes 
and the default value is 10 minutes.
        </description>
    </property>
   
    <property>
        
<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code</name>
        <value>
            JA017
        </value>
        <description>
            Automatic retry interval for workflow action is handled for 
these specified error code.
        </description>
    </property>
   
    <property>
        
<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
        <value> </value>
        <description>
            Automatic retry interval for workflow action is handled for 
these specified extra error code.
        </description>
</property>





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to