On 02/25/2014 04:42 PM, Ralph Castain wrote: >>> I'm curious whether this could be changed with a setting to >>> disregard the expected start time of higher priority jobs. >>> >>> Given that giving/estimating completion times of jobs is akin to >>> sorcery in many cases, it would be beneficial in my case to >>> always under-estimate the time limit. >>> >>> I'm wondering if anybody is running with a overly-conservative >>> TimeLimit for jobs, and abusing OverTimeLimit [very high value] >>> to achieve this. >>> >>> I know I would definitely use a EstimatedTimeLimit parameter for >>> improved backfilling and give an absolute ceiling with TimeLimit >>> (if I could). >> >> I haven't had time to work on this, but one idea would be estimate >> a job's run time based upon historic data and use that as a basis >> for backfill scheduling. I suspect the results would be better >> responsiveness and higher utilization than when basing scheduling >> decisions upon the user's time limit.
I can give pretty accurate estimates most of the times. What I cannot do however is set a conservative timelimit, because that would kill the job prematurely. As such, I need to give a timelimit with is within a 5-10x ballpark of the actual figure. If you think that some --estimated-time would be used by people that are able to do this, than you could use estimated-time for backfilling, and use timelimit as a hard limit. You could still default to an estimatedtime=timelimit when not specified, and get the current behavior. > FWIW: that has worked very poorly in the past. The problem is that > the workload depends heavily upon the data set, and so past > performance is a very poor indicator of future behavior except in > rare circumstances (e.g., a nightly weather forecast where the data > is consistent night after night). I can confirm that. Run time is dataset/parameter dependent.
