Hi,

We have an oozie workflow that imports data table by table from a RDBMS
using sqoop. One action per table. The sqoop commands use "split by column"
and spread out on a number of mappers.

We fork all the actions so basically all sqoop jobs are launched at once.

The RDBMS can only accept a fixed number of connections and if this is
exceeded, the sqoop action will fail and eventually the whole oozie
workflow will fail.

We use the yarn capacity scheduler (2.6.0) and have set up a specific queue
for this job to throttle the maximum number of concurrent containers.
However, this setup is hard to manage because all configurations in the
capacity scheduler are relative to the max amount of vcores of the cluster
and as we add machines or otherwise tune the cluster, the actual number of
containers granted to the oozie job changes and at times we hit the
connection roof.

So, is there another way to throttle the number of concurrent containers
for an oozie job? I guess you would have to be able to throttle both
launchers and map-reduce containers?

best regards
/Pelle


-- 

*Per Ullberg*
Tech Lead
Odin - Uppsala

Klarna AB
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00
Reg no: 556737-0431
klarna.com

Reply via email to