hi guys,
this seems to be a familiar subject but i still don't have a handle on it,
alas. I'm totally misunderstanding something here.

in our case we use tez and submit many different jobs to our queue called
"batch_sql". this is works great until the Tez job finishes (100% complete)
and instead of dropping out of the queue it hangs around for hours it seems
taking up a slot in our queue holding onto one container.

as you can imagine given our queue width is 15 after 15 Tez jobs we're log
jammed - so we wrote a script which looks for Tez jobs 100% complete and
perform yarn kill commands on them. So, yeah, certainly not ideal.

Looking at the doc i thought this config setting would influence those Tez
jobs from hanging around (tez.session.am.dag.submit.timeout.secs) but
testing proved otherwise. It didn't seem to have any affect.

So i ask. How to force off those Tez jobs organically? Or is there perhaps
something else i'm missing?

thanks,
Stephen.

Reply via email to