[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-09-19 Thread Abhishek Bafna (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Bafna updated OOZIE-1978:
--
Priority: Blocker  (was: Major)

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
>Priority: Blocker
> Fix For: 4.3.0
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978-005.patch, OOZIE-1978-006.patch, OOZIE-1978_wip.001.patch, 
> workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-08-31 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-006.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: 4.3.0
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978-005.patch, OOZIE-1978-006.patch, OOZIE-1978_wip.001.patch, 
> workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-08-03 Thread abhishek bafna (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

abhishek bafna updated OOZIE-1978:
--
Fix Version/s: (was: trunk)
   4.3.0

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: 4.3.0
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978-005.patch, OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-23 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-002.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978-005.patch, OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-23 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-005.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978-005.patch, OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-23 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: (was: OOZIE-1978-002.patch)

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978-005.patch, OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-23 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-002.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-002.patch, OOZIE-1978-003.patch, OOZIE-1978-004.patch, 
> OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-13 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-004.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-003.patch, OOZIE-1978-004.patch, OOZIE-1978_wip.001.patch, 
> workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-13 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-003.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978-003.patch, OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-13 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-002.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978-002.patch, 
> OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-13 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated OOZIE-1978:

Attachment: OOZIE-1978-001.patch

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Peter Bacsko
> Fix For: trunk
>
> Attachments: OOZIE-1978-001.patch, OOZIE-1978_wip.001.patch, 
> workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2016-07-11 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-1978:
-
Attachment: OOZIE-1978_wip.001.patch

That's a fancy ASCII diagram [~pbacsko].  I realized that there's a ton of 
paths before, but didn't realize there were that many (I never did the math 
here).

Anyway, I have a work-in-progress patch that I had been meaning to upload for 
quite some time, but never got around to it.  I'll attach it now.  Feel free to 
use it or steal things from it, but if you already have something, that's fine 
too.  It's been so long since I looked at it, and this whole thing gets 
complicated, so I don't remember how it works or what else I was planning to 
do, though there are some comments.

> Forkjoin validation code is ridiculously slow in some cases
> ---
>
> Key: OOZIE-1978
> URL: https://issues.apache.org/jira/browse/OOZIE-1978
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: trunk, 4.0.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: trunk
>
> Attachments: OOZIE-1978_wip.001.patch, workflow.xml
>
>
> We've had a few users who have run into problems where submitting a workflow 
> appears to hang (in the case of a subworkflow, it's similar but stuck in 
> PREP).  It turns out that if you wait long enough, it will actually go 
> through and the workflow will run normally.  The problem is that the forkjoin 
> validation code is taking a really long time.
> The attached example has a series of 20 forks where each fork has 6 actions 
> (it's based on an actual workflow, but all of the names were changed and the 
> actions were all replaced by simple shell actions).  One of our support guys 
> said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
> hours*{color} (I had to cancel it)
> While this example doesn't have any nested forks, those can also take a long 
> time too.
> It's easy to verify that it's the forkjoin validation code that's taking so 
> long by looking at a jstack of the Oozie server and seeing deep recursive 
> calls to 
> {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
> also noticed a lot of sitting around in calls LinkedList.contains.  
> I think we have 3 options:
> # See if we can make the existing code faster somehow.  Perhaps there's a way 
> to parallelize it?  Maybe there's some redundant checking that we can 
> identify and skip? Change some data structures? etc
> # See if we can write a new way to do this validation.  I had originally 
> completely rewritten this code a while ago, and we've since made a few fixes 
> to catch edge cases and things.  Perhaps it needs another rewrite?
> # Try to identify when it's taking a long time and at least let the user know 
> what's happening or something.  Right now, it just appears that the Oozie CLI 
> has hung and the job doesn't show up in the Oozie server.  Most users aren't 
> going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OOZIE-1978) Forkjoin validation code is ridiculously slow in some cases

2014-08-22 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-1978:
-

Attachment: workflow.xml

 Forkjoin validation code is ridiculously slow in some cases
 ---

 Key: OOZIE-1978
 URL: https://issues.apache.org/jira/browse/OOZIE-1978
 Project: Oozie
  Issue Type: Bug
  Components: core
Affects Versions: trunk, 4.0.1
Reporter: Robert Kanter
 Fix For: trunk

 Attachments: workflow.xml


 We've had a few users who have run into problems where submitting a workflow 
 appears to hang (in the case of a subworkflow, it's similar but stuck in 
 PREP).  It turns out that if you wait long enough, it will actually go 
 through and the workflow will run normally.  The problem is that the forkjoin 
 validation code is taking a really long time.
 The attached example has a series of 20 forks where each fork has 6 actions 
 (it's based on an actual workflow, but all of the names were changed and the 
 actions were all replaced by simple shell actions).  One of our support guys 
 said it took 1-2 hours , but on my computer it was taking {color:red}*15+ 
 hours*{color} (I had to cancel it)
 While this example doesn't have any nested forks, those can also take a long 
 time too.
 It's easy to verify that it's the forkjoin validation code that's taking so 
 long by looking at a jstack of the Oozie server and seeing deep recursive 
 calls to 
 {{org.apache.oozie.workflow.lite.LiteWorkflowAppParser.validateForkJoin}}.  I 
 also noticed a lot of sitting around in calls LinkedList.contains.  
 I think we have 3 options:
 # See if we can make the existing code faster somehow.  Perhaps there's a way 
 to parallelize it?  Maybe there's some redundant checking that we can 
 identify and skip? Change some data structures? etc
 # See if we can write a new way to do this validation.  I had originally 
 completely rewritten this code a while ago, and we've since made a few fixes 
 to catch edge cases and things.  Perhaps it needs another rewrite?
 # Try to identify when it's taking a long time and at least let the user know 
 what's happening or something.  Right now, it just appears that the Oozie CLI 
 has hung and the job doesn't show up in the Oozie server.  Most users aren't 
 going to wait more than a minute or two.



--
This message was sent by Atlassian JIRA
(v6.2#6252)