Hey Mike,

I'm not sure if i ever created an issue for this, but this issue is why i always set the resource manager's queue size 1 greater than what the wengine was was told it was... you will see this issue appear quite often if one or more of the expected batch stubs are never brought up... there is also another issue related to this one which is: if a job qualifies to go to more than one batch stub, and one of those batch stubs is down, and that down batch stub is selected for the job, then the job will either get lost (the bug you noticed) or put to the back of the list of jobs when really the job should be schedule with one of the other batch stubs it qualified to be sent to (this is the other bug i believe to be related to the bug you noticed)... i'm not sure that this stuff is really wiki data, but rather it really should be in a issue, and the current work-around until the issue is fixed should be put in that issue

-brian

On Apr 09, 2012, at 09:12 AM, "Cayanan, Michael D (388J)" <[email protected]> wrote:

Hey Chris,

Comments are below.

On 4/6/12 9:01 PM, "Mattmann, Chris A (388J)"
<[email protected]> wrote:

>Hi Mike,
>
>Thanks, what a great page!
>
>I noticed this comment in the page:
>
>"At the time of this writing, jobs that cannot be added to the queue
>disappear...."
>
>I think we should be more clear than "disappear". They don't disappear.
>The
>Scheduler will try and send a Job to the BatchMgr, and if there is an
>exception,
>it tries to re-queue the Job back onto the JobStack. If it's unable to do
>that, then
>there is an issue, but it at the very least tries to re-queue the job if
>there was an
>issue.

The reason this blurb was put into the wiki was because when Gabe and I
were looking through the Resource Manager code, this is what looks to be
happening. Check out the piece of code that tries to add a job:

In the JobStack.java:

public String addJob(JobSpec spec) throws JobQueueException {
String jobId = safeAddJob(spec);
if (queue.size() != maxQueueSize) {
LOG.log(Level.INFO, "Added Job: [" + spec.getJob().getId() + "] to
queue");
queue.add(spec);
spec.getJob().setStatus(JobStatus.QUEUED);
safeUpdateJob(spec);
return jobId;
} else
throw new JobQueueException("Reached max queue size: [" + maxQueueSize
+ "]: Unable to add job: [" + spec.getJob().getId() + "]");
}
}


When the JobQueueException gets thrown, the Resource Manager throws a
SchedulerException:

In the XmlRpcResourceManager.java:

private String genericHandleJob(Hashtable jobHash, Object jobIn) throws
SchedulerException {

...

try {
jobId = scheduler.getJobQueue().addJob(spec);
} catch (JobQueueException e) {
LOG.log(Level.WARNING, "JobQueue exception adding job: Message: " +
e.getMessage());
throw new SchedulerException(e.getMessage());
}
return jobId;
}


From here, I can't see where the job gets re-queued if the max queue size
is reached. If this is true, I can certainly file a JIRA issue.

>
>Also, in general, you will have as many jobs queued in Resource Manager
>land
>as the size of that job stack. So we should probably note that.
>
>Great resource here, thanks for putting it together!
>
>Cheers,
>Chris

Okay, I've noted this in the wiki as well.

Cheers,
Mike

>
>On Apr 5, 2012, at 8:43 AM, Cayanan, Michael D (388J) wrote:
>
>> Hi all,
>>
>> I recently added a page to the OODT wiki:
>>
>> https://cwiki.apache.org/confluence/display/OODT/Workflow+Manager+Help
>>
>> I ran into some issues with the Workflow hanging up and also Workflow
>>jobs being lost when trying to send them off to the Resource Manager and
>>just wanted to share what I learned and what to do if you run into these
>>issues as well.
>>
>> Cheers,
>> Mike
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: [email protected]
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Reply via email to