[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-29 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: (was: OOZIE-3670-001.patch)

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: forkjoin.xml, helloworld.sh, job.properties
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-29 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: OOZIE-3670-001.patch

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: OOZIE-3670-001.patch, forkjoin.xml, helloworld.sh, 
> job.properties
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-29 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: (was: OOZIE-3670-002.patch)

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: forkjoin.xml, helloworld.sh, job.properties
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-29 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: helloworld.sh

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, 
> forkjoin.xml, helloworld.sh, job.properties
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-29 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: job.properties

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, 
> forkjoin.xml, helloworld.sh, job.properties
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-29 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: forkjoin.xml

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch, forkjoin.xml
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3670) Actions can stuck while running in a Fork-Join workflow

2022-11-13 Thread Janos Makai (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Makai updated OOZIE-3670:
---
Attachment: OOZIE-3670-002.patch

> Actions can stuck while running in a Fork-Join workflow
> ---
>
> Key: OOZIE-3670
> URL: https://issues.apache.org/jira/browse/OOZIE-3670
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Janos Makai
>Assignee: Janos Makai
>Priority: Major
> Attachments: OOZIE-3670-001.patch, OOZIE-3670-002.patch
>
>
> Fork node splits one path of execution into multiple concurrent paths of 
> execution and the join node waits until every concurrent execution path of a 
> previous fork node arrives to it. Given a scenario, when one of the paths 
> [action] fails for some exotic reason - in our case (see attachment) with an 
> EL Error - then the workflow job itself will fail as well, however the other 
> actions running parallelly under the same workflow job will stuck in RUNNING 
> state until they are purged, which can lead to Oozie slow-down in extreme 
> cases.
> This behaviour can be reproduced using the attached 
> [forkjoin.xml{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531918/531918_forkjoin.xml],
>  
> [job.properties{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531916/531916_job.properties],
>   and 
> [helloworld.sh{^}!https://jira.cloudera.com/images/icons/link_attachment_7.gif|width=7,height=7,align=absmiddle!{^}|https://jira.cloudera.com/secure/attachment/531917/531917_helloworld.sh].
> In the above workflow, [action2] will fail due to ELError because
> {code:java}
> ${variableThatWillCauseELError} {code}
> could not be evaluated, but at the same time [action1] tries to complete 
> itself but remains in RUNNING state.
> We have examined the situation at surface level, but we need to get a deeper 
> understanding regarding the mechanism of fork-join workflows to proceed 
> further.
> Suspected classes are for starting point:
>  - org.apache.oozie.workflow.lite.LiteWorkflowInstance
>  - org.apache.oozie.command.wf.ActionCheckXCommand
>  - what if we do not throw Exception in 
> org.apache.oozie.command.wf.ActionCheckXCommand#verifyPrecondition ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)