Hi, If you want to only run one of the Sqoop Actions at a time, why not simply remove the fork and run the Sqoop Actions sequentially?
- Robert On Tue, May 3, 2016 at 12:15 AM, Per Ullberg <[email protected]> wrote: > Hi, > > We have an oozie workflow that imports data table by table from a RDBMS > using sqoop. One action per table. The sqoop commands use "split by column" > and spread out on a number of mappers. > > We fork all the actions so basically all sqoop jobs are launched at once. > > The RDBMS can only accept a fixed number of connections and if this is > exceeded, the sqoop action will fail and eventually the whole oozie > workflow will fail. > > We use the yarn capacity scheduler (2.6.0) and have set up a specific queue > for this job to throttle the maximum number of concurrent containers. > However, this setup is hard to manage because all configurations in the > capacity scheduler are relative to the max amount of vcores of the cluster > and as we add machines or otherwise tune the cluster, the actual number of > containers granted to the oozie job changes and at times we hit the > connection roof. > > So, is there another way to throttle the number of concurrent containers > for an oozie job? I guess you would have to be able to throttle both > launchers and map-reduce containers? > > best regards > /Pelle > > > -- > > *Per Ullberg* > Tech Lead > Odin - Uppsala > > Klarna AB > Sveavägen 46, 111 34 Stockholm > Tel: +46 8 120 120 00 > Reg no: 556737-0431 > klarna.com >
