[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641186#comment-17641186 ] Janos Makai commented on OOZIE-3670: I sincerely appreciate your help [~dionusos] > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Fix For: 5.3.0 > > Attachments: OOZIE-3670-001.patch, forkjoin.xml, helloworld.sh, > job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641183#comment-17641183 ] ASF subversion and git services commented on OOZIE-3670: Commit 13837d9b16c793eea2d9ce40052d956417102450 in oozie's branch refs/heads/master from Denes Bodo [ https://gitbox.apache.org/repos/asf?p=oozie.git;h=13837d9b1 ] OOZIE-3670 Actions can stuck while running in a Fork-Join workflow (jmakai via dionusos) > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, forkjoin.xml, helloworld.sh, > job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641170#comment-17641170 ] Janos Makai commented on OOZIE-3670: Hello [~dionusos], could you have a look at the latest patch set? The SpotBugs discovered errors are unrelated to this change. > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, forkjoin.xml, helloworld.sh, > job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640866#comment-17640866 ] Hadoop QA commented on OOZIE-3670: -- Testing JIRA OOZIE-3670 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any star imports .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 3 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} Javadoc generation succeeded with the patch .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [6] new bugs found below threshold in total that must be fixed. .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [fluent-job/fluent-job-api]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/git]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [server]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [webapp]. .{color:red}-1{color} There are [6] new bugs found below threshold in [core] that must be fixed, listing only the first [5] ones. .You can find the SpotBugs diff here (look for the red and orange ones): core/findbugs-new.html .The top [5] most important SpotBugs errors are: .At BulkJPAExecutor.java:[line 206]: This use of javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query; can be vulnerable to SQL/JPQL injection .At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175] .At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199] .This use of javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query; can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206] .At BulkJPAExecutor.java:[line 111]: At BulkJPAExecutor.java:[line 127] .{color:green}+1{color} There are no new bugs found in [tools]. .{color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 3246 .{color:orange}Tests failed at first run:{color} TestCoordActionsKillXCommand#testActionKillCommandActionNumbers .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}+1 MODERNIZER{color} {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/148/ > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, forkjoin.xml, helloworld.sh, > job.properties > > > Fork node spli
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640785#comment-17640785 ] Hadoop QA commented on OOZIE-3670: -- PreCommit-OOZIE-Build started > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, forkjoin.xml, helloworld.sh, > job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640660#comment-17640660 ] Hadoop QA commented on OOZIE-3670: -- Testing JIRA OOZIE-3670 Cleaning local git workspace {color:red}-1{color} Patch failed to apply to head of branch > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, > forkjoin.xml, helloworld.sh, job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640649#comment-17640649 ] Hadoop QA commented on OOZIE-3670: -- PreCommit-OOZIE-Build started > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, > forkjoin.xml, helloworld.sh, job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640646#comment-17640646 ] Janos Makai commented on OOZIE-3670: Done > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, > forkjoin.xml, helloworld.sh, job.properties > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640632#comment-17640632 ] Dénes Bodó commented on OOZIE-3670: --- [~jmakai] sure, I will. But before that can you please attach the files you mentioned in the description that they are attached? Thank you > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640452#comment-17640452 ] Janos Makai commented on OOZIE-3670: Hello [~dionusos], could you please check the latest patch set? The SpotBugs bugs are unrelated. Thanks in advance. > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633284#comment-17633284 ] Hadoop QA commented on OOZIE-3670: -- Testing JIRA OOZIE-3670 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any star imports .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 3 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} Javadoc generation succeeded with the patch .{color:green}+1{color} the patch does not seem to introduce new Javadoc warning(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [6] new bugs found below threshold in total that must be fixed. .{color:green}+1{color} There are no new bugs found in [examples]. .{color:green}+1{color} There are no new bugs found in [fluent-job/fluent-job-api]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/git]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:orange}0{color} There are [4] new bugs found in [server] that would be nice to have fixed. .You can find the SpotBugs diff here: server/findbugs-new.html .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [webapp]. .{color:red}-1{color} There are [6] new bugs found below threshold in [core] that must be fixed, listing only the first [5] ones. .You can find the SpotBugs diff here (look for the red and orange ones): core/findbugs-new.html .The top [5] most important SpotBugs errors are: .At BulkJPAExecutor.java:[line 206]: This use of javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query; can be vulnerable to SQL/JPQL injection .At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175] .At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199] .This use of javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query; can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206] .At BulkJPAExecutor.java:[line 111]: At BulkJPAExecutor.java:[line 127] .{color:green}+1{color} There are no new bugs found in [tools]. .{color:green}+1{color} There are no new bugs found in [client]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 3228 .{color:orange}Tests failed at first run:{color} TestBlockingInputStream#testFastWritingBlockingInputStream .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}+1 MODERNIZER{color} {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/103/ > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attac
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633260#comment-17633260 ] Hadoop QA commented on OOZIE-3670: -- PreCommit-OOZIE-Build started > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633258#comment-17633258 ] Janos Makai commented on OOZIE-3670: Fixed the javadoc related issues in my latest patch. > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632499#comment-17632499 ] Hadoop QA commented on OOZIE-3670: -- Testing JIRA OOZIE-3670 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:green}+1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any star imports .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:green}+1{color} the patch adds/modifies 3 testcase(s) {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:red}-1 JAVADOC{color} .{color:green}+1{color} Javadoc generation succeeded with the patch .{color:red}-1{color} the patch seems to introduce 3 new Javadoc warning(s) {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:red}-1{color} There are [7] new bugs found below threshold in total that must be fixed. .{color:green}+1{color} There are no new bugs found in [client]. .{color:green}+1{color} There are no new bugs found in [webapp]. .{color:green}+1{color} There are no new bugs found in [fluent-job/fluent-job-api]. .{color:green}+1{color} There are no new bugs found in [server]. .{color:red}-1{color} There are [7] new bugs found below threshold in [core] that must be fixed, listing only the first [5] ones. .You can find the SpotBugs diff here (look for the red and orange ones): core/findbugs-new.html .The top [5] most important SpotBugs errors are: .At BulkJPAExecutor.java:[line 206]: This use of javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query; can be vulnerable to SQL/JPQL injection .At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175] .At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199] .This use of javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query; can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206] .At BulkJPAExecutor.java:[line 111]: At BulkJPAExecutor.java:[line 127] .{color:green}+1{color} There are no new bugs found in [sharelib/spark]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive2]. .{color:green}+1{color} There are no new bugs found in [sharelib/distcp]. .{color:green}+1{color} There are no new bugs found in [sharelib/streaming]. .{color:green}+1{color} There are no new bugs found in [sharelib/hive]. .{color:green}+1{color} There are no new bugs found in [sharelib/git]. .{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog]. .{color:green}+1{color} There are no new bugs found in [sharelib/pig]. .{color:green}+1{color} There are no new bugs found in [sharelib/oozie]. .{color:green}+1{color} There are no new bugs found in [sharelib/sqoop]. .{color:green}+1{color} There are no new bugs found in [tools]. .{color:green}+1{color} There are no new bugs found in [docs]. .{color:green}+1{color} There are no new bugs found in [examples]. {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:green}+1 TESTS{color} .Tests run: 3226 .{color:orange}Tests failed at first run:{color} TestCoordActionInputCheckXCommand#testNone TestBlockingInputStream#testLimitedWritingBlockingInputStream .For the complete list of flaky tests, see TEST-SUMMARY-FULL files. {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:green}+1 MODERNIZER{color} {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/100/ > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch > > > Fork node splits one path of exe
[jira] [Commented] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow
[ https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632446#comment-17632446 ] Hadoop QA commented on OOZIE-3670: -- PreCommit-OOZIE-Build started > Actions can stuck while running in a Fork-Join workflow > --- > > Key: OOZIE-3670 > URL: https://issues.apache.org/jira/browse/OOZIE-3670 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: Janos Makai >Assignee: Janos Makai >Priority: Major > Attachments: OOZIE-3670-001.patch > > > Fork node splits one path of execution into multiple concurrent paths of > execution and the join node waits until every concurrent execution path of a > previous fork node arrives to it. Given a scenario, when one of the paths > [action] fails for some exotic reason - in our case (see attachment) with an > EL Error - then the workflow job itself will fail as well, however the other > actions running parallelly under the same workflow job will stuck in RUNNING > state until they are purged, which can lead to Oozie slow-down in extreme > cases. > This behaviour can be reproduced using the attached > [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml], > > [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties], > and > [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh]. > In the above workflow, [action2] will fail due to ELError because > {code:java} > ${variableThatWillCauseELError} {code} > could not be evaluated, but at the same time [action1] tries to complete > itself but remains in RUNNING state. > We have examined the situation at surface level, but we need to get a deeper > understanding regarding the mechanism of fork-join workflows to proceed > further. > Suspected classes are for starting point: > - org.apache.oozie.workflow.lite.LiteWorkflowInstance > - org.apache.oozie.command.wf.ActionCheckXCommand > - what if we do not throw Exception in > org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ? -- This message was sent by Atlassian Jira (v8.20.10#820010)