On 02/25/2014 04:42 PM, Ralph Castain wrote:
>>> I'm curious whether this could be changed with a setting to
>>> disregard the expected start time of higher priority jobs.
>>> 
>>> Given that giving/estimating completion times of jobs is akin to
>>> sorcery in many cases, it would be beneficial in my case to
>>> always under-estimate the time limit.
>>> 
>>> I'm wondering if anybody is running with a overly-conservative
>>> TimeLimit for jobs, and abusing OverTimeLimit [very high value]
>>> to achieve this.
>>> 
>>> I know I would definitely use a EstimatedTimeLimit parameter for 
>>> improved backfilling and give an absolute ceiling with TimeLimit
>>> (if I could).
>> 
>> I haven't had time to work on this, but one idea would be estimate
>> a job's run time based upon historic data and use that as a basis
>> for backfill scheduling. I suspect the results would be better
>> responsiveness and higher utilization than when basing scheduling
>> decisions upon the user's time limit.

I can give pretty accurate estimates most of the times.

What I cannot do however is set a conservative timelimit, because that
would kill the job prematurely. As such, I need to give a timelimit with
is within a 5-10x ballpark of the actual figure.

If you think that some --estimated-time would be used by people that are
able to do this, than you could use estimated-time for backfilling, and
use timelimit as a hard limit. You could still default to an
estimatedtime=timelimit when not specified, and get the current behavior.

> FWIW: that has worked very poorly in the past. The problem is that
> the workload depends heavily upon the data set, and so past
> performance is a very poor indicator of future behavior except in
> rare circumstances (e.g., a nightly weather forecast where the data
> is consistent night after night).

I can confirm that. Run time is dataset/parameter dependent.

Reply via email to