[
https://issues.apache.org/jira/browse/OOZIE-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170546#comment-13170546
]
[email protected] commented on OOZIE-591:
-----------------------------------------------------
bq. On 2011-12-12 20:44:14, Alejandro Abdelnur wrote:
bq. > The proposed approach seems overkilling when the following change would
achieve the same without introducing new concurrent logic:
bq. >
bq. > * The CallableQueueService would define a PriorityQueue with 4
priorities instead of 3.
bq. > * The queue() method would ensure there no command being queued has
priority 4 (forbidding direct use of the the highest priority).
bq. > * The CallableQueueService would a new protected method 'Set<Class>
getInterruptCommandClasses()' that would return the commands that are
'interrupt' commands.
bq. > * On Queuing, within the queue() command any command which class is in
the above Set will be bumped to priority 4.
bq. >
bq. > This achieves exactly the same behavior with the current well tested
logic.
bq. >
bq.
bq. Mohamed Battisha wrote:
bq. I do not see it overkilling. It is exactly the same behavior you
proposed but instead of using a high priority queue, we are using a per-job
interrupt map. I tried as much as possible to implement it within the same
theme. [I am mapping the poposed behavior with the current design which is
almost 1:1]
bq. - * The CallableQueueService would define a PriorityQueue with 4
priorities instead of 3.
bq. - Using a InterruptMap within the CallableQueueService.
bq.
bq. * The queue() method would ensure there no command being queued has
priority 4 (forbidding direct use of the the highest priority).
bq. - No change here are we are not going to use a special queue. instead
while polling the data from the queue, we will check the map first
bq.
bq. * The CallableQueueService would a new protected method 'Set<Class>
getInterruptCommandClasses()' that would return the commands that are
'interrupt' commands.
bq. * The interrupt commands are stored in a local set within
CallableQueueService
bq.
bq. * On Queuing, within the queue() command any command which class is in
the above Set will be bumped to priority 4.
bq. On Queuing, within the queue command, if the command type belongs to
the InterruptSet, it will inserted to Interrupt Map
bq.
bq.
bq. The major advantage of the map approach over queue approach is we are
a avoiding job starvation. Specially, in large deployment in which hundreds of
thousands of commands are running from thousands of jobs. In this case a few
jobs with interrupt commands may take full control of the thread poll causing
other jobs to starve. The per-job map will avoid such a behavior
bq.
bq. Alejandro Abdelnur wrote:
bq. Mohamed,
bq.
bq. The QueueCommandService has anti-starvation logic already, handled by
the callable.concurrency settings. This ensures that never a single command
takes over all threads. Because of this mechanism, what you point out as an
issue, it is not.
bq.
bq. So again, the alternate solution I'm suggesting restricts the changes
to one place (the CallableQueueService) and it uses the existing, well proven
priority and concurrency mechanism handled by the CllableQueueService.
bq.
bq. Thanks.
bq.
bq. Alejandro
bq.
bq.
bq. Mohamed Battisha wrote:
bq. Alejandro
bq. I am assumed from your solution, the anti-starvation will not push
other commands to the higher priority queue which will be only restricted for
the interrupted commands
bq. If the anti-starvation will push other commands to the higher priority
queue, so we are facing the same exact issue again. we are only using four
queues instead of three.
bq.
bq. The changes I made in mainly at the CallableQueueService with the same
exact logic you proposed [using a map instead of a higher priority queue].
During my tests, i figured out there are some commands that are not checking
for null ref in GetLockKey(). I added these checks in these commands to avoid a
null-ref exceptions in some rare condition which give an impression of a bigger
change :)
bq.
bq. Thanks!
bq. Mohamed
bq.
bq. Alejandro Abdelnur wrote:
bq. Mohamed,
bq.
bq. Your are correct, then antiStarvation() method in the
PriorityDelayQueue should be modified to stop the priority rising one priority
less this means that line 488 should do:
bq.
bq. for (int i = 0; i < queues.length - 2; i++) {
bq.
bq. Good catch.
bq.
bq. Thxs.
I did some experiment to compare the two approached [Interrupt Map Vs. the
Hi-Pri Queue]
The goal of the experiment is to figure out how the Hi-Pri Queue reacted to a
long locking time for a few jobs.
In the experiment, I had 10 running jobs on a Thread poll of 5 threads. Out of
the 10 jobs, 3 of them have a long running commands that required the job to be
locked for 5 minutes.
At the same time, I generated three different interrupt commands for the given
three jobs.
Here what will happen based on the Hi-Pri Queue approach:
- The interrupt commands will be solely reside in the Hi-Pri Queue with no
other commands from other jobs [no command will be promoted to the Hi-Pri
queue].
- 3 of the 5 threads will be mainly allocated to the long running commands.
- The remaining 2 threads will be running on Hi-Pri queue. Mainly trying to
acquire the lock, failed and re-instered again on the Hi-Pri Queue [simply
doing no thing]
Based on this, the system will be mainly locked for 5 minutes, and the
remaining 7 jobs will be simply stalled.
As I mentioned before, the problem will be exaggerated when the number of jobs
increased and its probability and so its frequency will increase.
As well, if some of the commands faces a locking issues which erroneously did
not release the lock, it can cause the whole system to be stalled which should
be avoided [in shared queue we can not let an erroneous jobs impact the whole
queue. our design should isolate is as much as possible]
Using, the interrupt map will simply avoid such a locking scheme, isolating the
erroneous jobs while achieving the same goal of giving some commands an
immediate execution.
Again, as I mentioned before, the change for the interrupt map is pretty
simple and we should not be afraid of it. As well, I run a full test cases to
cover different scenario and things are pretty solid.
Thanks!
Mohamed
- Mohamed
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3030/#review3851
-----------------------------------------------------------
On 2011-12-09 18:58:56, Mohamed Battisha wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/3030/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-12-09 18:58:56)
bq.
bq.
bq. Review request for oozie.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Oozie continues to materialize new actions after end date modification.
The main issue is related how Oozie materialized the actions based on a FIFO
priority queue. Changing the status of a bundle/coordinator job should take
higher priority than executing this job.
bq.
bq. The main idea is to enable the queue to handle interruptions. Mainly, once
you finished what you are working on currently, you should focus on executing
this next action.
bq.
bq. The issue can be illustrated as follow:
bq.
bq. 1. Configure a pipeline to run for 1 hour
bq. 2. Start the pipeline
bq. 3. After it starts materializing new actions, change the end time (in my
example - to 10 minutes after the pipeline
bq. starts)
bq. 4. Monitor the coordinator apps - they will continue to materialize new
actions past the end time.
bq.
bq.
bq. This addresses bug OOZIE-591.
bq. https://issues.apache.org/jira/browse/OOZIE-591
bq.
bq.
bq. Diffs
bq. -----
bq.
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/Command.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/XCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/bundle/BundleJobResumeXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/bundle/BundlePauseXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/bundle/BundleRerunXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/bundle/BundleStartXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/bundle/BundleStatusUpdateXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/bundle/BundleUnpauseXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordActionInputCheckXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordActionNotificationXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordActionReadyXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordActionStartXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordActionTimeOutXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordActionUpdateXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordPauseXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordRerunXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordResumeXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordSuspendXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/coord/CoordUnpauseXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/wf/ResumeXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/command/wf/WfEndXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/service/CallableQueueService.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/service/RecoveryService.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/util/PollablePriorityDelayQueue.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/java/org/apache/oozie/util/XCallable.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/main/resources/oozie-default.xml
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/test/java/org/apache/oozie/command/TestCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/test/java/org/apache/oozie/command/coord/TestCoordActionInputCheckXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/test/java/org/apache/oozie/command/coord/TestCoordActionStartXCommand.java
1209829
bq.
http://svn.apache.org/repos/asf/incubator/oozie/trunk/core/src/test/java/org/apache/oozie/service/TestCallableQueueService.java
1209829
bq.
bq. Diff: https://reviews.apache.org/r/3030/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Regression for all the Commands in addition for specific testing for the
Interrupt Driven Map
bq.
bq.
bq. Thanks,
bq.
bq. Mohamed
bq.
bq.
> Oozie continues to materialize new actions after end date modification
> ----------------------------------------------------------------------
>
> Key: OOZIE-591
> URL: https://issues.apache.org/jira/browse/OOZIE-591
> Project: Oozie
> Issue Type: New Feature
> Reporter: Mohamed Battisha
> Labels: patch
> Attachments: Interrupt Driven Commands.docx
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Oozie continues to materialize new actions after end date modification. The
> main issue is related how Oozie materialized the actions based on a FIFO
> priority queue. Changing the status of a bundle/coordinator job should take
> higher priority than executing this job.
> The main idea is to enable the queue to handle interruptions. Mainly, once
> you finished what you are working on currently, you should focus on executing
> this next action.
> The issue can be illustrated as follow:
> 1. Configure a pipeline to run for 1 hour
> 2. Start the pipeline
> 3. After it starts materializing new actions, change the end time (in my
> example - to 10 minutes after the pipeline
> starts)
> 4. Monitor the coordinator apps - they will continue to materialize new
> actions past the end time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira