[ 
https://issues.apache.org/jira/browse/UIMA-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Eckart de Castilho resolved UIMA-4434.
----------------------------------------------
    Resolution: Abandoned

DUCC has been retired.

> DUCC Orchestrator (OR) job:node blacklisting
> --------------------------------------------
>
>                 Key: UIMA-4434
>                 URL: https://issues.apache.org/jira/browse/UIMA-4434
>             Project: UIMA
>          Issue Type: Improvement
>          Components: DUCC
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>            Priority: Major
>             Fix For: future-DUCC
>
>
> A submitted Job may have shares allocated on some nodes where the JP works 
> and some nodes where the JP fails.
> With respect to initialization, the OR should have a limit to the number of 
> initialization failures on a node before that node is banished for the Job.  
> The OR should communicate the blacklisted nodes for each Job to the RM who 
> should then not allocate and shares on said nodes for said corresponding Jobs.
> An example failure situation is as follows:
> 1. Node X does not have Filesystem F mounted
> 2. Job 1 is submitted and is allocated to Node X
> 3. Job 1's JP on Node X fails initialization (missing files!)
> 4. RM allocates next JP for Job 1 to same Node X, ad infinitum until max init 
> failures is reached
> 5. Job 1 is prevented from expanding because of a single "bad" Node
> If Node X had been blacklisted, then the RM would have allocated Node Y to 
> Job 1 and expansion could have occurred.
> Other types of JP failure scenarios: process croak and work item 
> failure/timeout will not be considered for blacklisting, presently.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to