[jira] [Comment Edited] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706284#comment-17706284 ] János Makai edited comment on OOZIE-3715 at 3/29/23 9:08 AM: - Hello [~chenhd], Thank you for your patience in waiting for my feedback. Based on your initial statement and the test code, it is my understanding that the test case ({*}org.apache.oozie.command.wf.TestSignalXCommand#testForkSubmitFail{*}) should fail without any changes made to {*}SignalXCommand{*}. However, despite having commented out the new line in *SignalXCommand* on my local computer, the test case still managed to succeed. Please inform me if my comprehension is accurate. Considering the perspective of clean code, my observations are: * Please do not use star import ({*}{color:#0747a6}import org.apache.oozie.service.*;{color}{*}) * Fix the 1 line that is longer than 132 characters ({*}org.apache.oozie.command.wf.TestSignalXCommand#504{*}) * Kindly furnish a comprehensive Javadoc description for the new test case. * The new test case contains unnecessary code duplications. Specifically, in {*}org.apache.oozie.command.wf.TestSignalXCommand#_testForkSubmitFail{*}, the two implemented scenarios differ only in the value assigned to {*}org.apache.oozie.command.wf.SignalXCommand#FORK_PARALLEL_JOBSUBMISSION{*}. To prevent duplication, this test case could be parameterized based on this difference. * {+}Question{+}: Why the extra *{color:#0747a6}Thread.sleep(500);{color}* is needed in {*}org.apache.oozie.command.wf.TestSignalXCommand#493{*}, after the while loop? * Please ensure consistent spacing throughout the code. For instance, in the following line: {code:java} while (!WorkflowJob.Status.FAILED.equals( engine.getJob(jobId).getStatus())){ {code} there is an extraneous space before "engine," and no space before the closing bracket. Please apply this same consideration to every instance in the code. was (Author: jmakai): Hello [~chenhd], Thank you for your patience in waiting for my feedback. Based on your initial statement and the test code, it is my understanding that the test case ({*}org.apache.oozie.command.wf.TestSignalXCommand#testForkSubmitFail{*}) should fail without any changes made to {*}SignalXCommand{*}. However, despite having commented out the new line in *SignalXCommand* on my local computer, the test case still managed to succeed. Please inform me if my comprehension is accurate. Considering the perspective of clean code, my observations are: * Please do not use star import ({*}{color:#0747a6}import org.apache.oozie.service.*;{color}{*}) * Fix the 1 line that is longer than 132 characters ({*}org.apache.oozie.command.wf.TestSignalXCommand#504{*}) * Kindly furnish a comprehensive Javadoc description for the new test case. * The new test case contains unnecessary code duplications. Specifically, in {*}org.apache.oozie.command.wf.TestSignalXCommand#_testForkSubmitFail{*}, the two implemented scenarios differ only in the value assigned to {*}org.apache.oozie.command.wf.SignalXCommand#FORK_PARALLEL_JOBSUBMISSION{*}. To prevent duplication, this test case could be parameterized based on this difference. * {+}Question{+}: Why the extra *{color:#0747a6}Thread.sleep(500);{color}* is needed in {*}org.apache.oozie.command.wf.TestSignalXCommand#493{*}, after the while loop? * Please ensure consistent spacing throughout the code. For instance, in the following line: {code:java} while (!WorkflowJob.Status.FAILED.equals( engine.getJob(jobId).getStatus())){ {code} there is an extraneous space before "engine," and no space before the closing bracket. Please apply this same consideration to every instance in the code. * {+}Question{+}: Why the *{color:#0747a6}assertNotNull(failAction);{color}* and *{color:#0747a6}assertNotNull(failAction);{color}* needed at the end of the test? > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXComma
[jira] [Comment Edited] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704132#comment-17704132 ] János Makai edited comment on OOZIE-3715 at 3/23/23 12:47 PM: -- [~chenhd] looks like the format of the patch is OK now, but please fix the following two issues before moving forward: * repair the line too long issue which is indicated by the raw patch analysis * modify/create unit test case(s) to validate the purpose of this change Thank you was (Author: jmakai): [~chenhd] looks like the format of the patch is OK now, but please fix the following two issues before moving forward: * repair the line too long issue which is indicated by the raw patch analysis * modify/create test case(s) to validate the purpose of this change Thank you > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702971#comment-17702971 ] chenhaodan edited comment on OOZIE-3715 at 3/21/23 4:20 AM: [~dionusos] Yes. That's strange , I've created the change on top of master branch. And I don't know why patch error? The error message said that "No such file or directory" and "error: while searching for:". was (Author: chenhd): [~dionusos] Yes. That's strange , I've created your change on top of master branch. And I don't know why patch error? The error message said that "No such file or directory" and "error: while searching for:". > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Attachments: OOZIE-3715-1-1.patch, OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)