Hi, We have an oozie workflow that imports data table by table from a RDBMS using sqoop. One action per table. The sqoop commands use "split by column" and spread out on a number of mappers.
We fork all the actions so basically all sqoop jobs are launched at once. The RDBMS can only accept a fixed number of connections and if this is exceeded, the sqoop action will fail and eventually the whole oozie workflow will fail. We use the yarn capacity scheduler (2.6.0) and have set up a specific queue for this job to throttle the maximum number of concurrent containers. However, this setup is hard to manage because all configurations in the capacity scheduler are relative to the max amount of vcores of the cluster and as we add machines or otherwise tune the cluster, the actual number of containers granted to the oozie job changes and at times we hit the connection roof. So, is there another way to throttle the number of concurrent containers for an oozie job? I guess you would have to be able to throttle both launchers and map-reduce containers? best regards /Pelle -- *Per Ullberg* Tech Lead Odin - Uppsala Klarna AB Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 Reg no: 556737-0431 klarna.com
