Hi,

If you want to only run one of the Sqoop Actions at a time, why not simply
remove the fork and run the Sqoop Actions sequentially?

- Robert

On Tue, May 3, 2016 at 12:15 AM, Per Ullberg <[email protected]> wrote:

> Hi,
>
> We have an oozie workflow that imports data table by table from a RDBMS
> using sqoop. One action per table. The sqoop commands use "split by column"
> and spread out on a number of mappers.
>
> We fork all the actions so basically all sqoop jobs are launched at once.
>
> The RDBMS can only accept a fixed number of connections and if this is
> exceeded, the sqoop action will fail and eventually the whole oozie
> workflow will fail.
>
> We use the yarn capacity scheduler (2.6.0) and have set up a specific queue
> for this job to throttle the maximum number of concurrent containers.
> However, this setup is hard to manage because all configurations in the
> capacity scheduler are relative to the max amount of vcores of the cluster
> and as we add machines or otherwise tune the cluster, the actual number of
> containers granted to the oozie job changes and at times we hit the
> connection roof.
>
> So, is there another way to throttle the number of concurrent containers
> for an oozie job? I guess you would have to be able to throttle both
> launchers and map-reduce containers?
>
> best regards
> /Pelle
>
>
> --
>
> *Per Ullberg*
> Tech Lead
> Odin - Uppsala
>
> Klarna AB
> Sveavägen 46, 111 34 Stockholm
> Tel: +46 8 120 120 00
> Reg no: 556737-0431
> klarna.com
>

Reply via email to