The fair scheduler would solve this issue, but we need the capacity
scheduler for other reason. Would it be possible to run multiple schedulers
in parallel?

/Pelle

On Thursday, May 26, 2016, David Morel <[email protected]> wrote:

> Le 26 mai 2016 9:04 AM, "Per Ullberg" <[email protected]
> <javascript:;>> a écrit :
> >
> > The split is skewed. Just running one sqoop action will cause some
> > containers to finish early and others to finish late. If we run the
> actions
> > concurrently, the early finishers will be idle until all containers for
> > that action is done and the next action can commence. By running the
> > actions in parallel, we will finish earlier in total and also utilize our
> > cluster resources better.
>
> I used the FairScheduler for exactly this scenario at my previous job.
>
> David
>
> > regards
> > /Pelle
> >
> > On Thu, May 26, 2016 at 3:09 AM, Robert Kanter <[email protected]
> <javascript:;>>
> wrote:
> >
> > > Hi,
> > >
> > > If you want to only run one of the Sqoop Actions at a time, why not
> simply
> > > remove the fork and run the Sqoop Actions sequentially?
> > >
> > > - Robert
> > >
> > > On Tue, May 3, 2016 at 12:15 AM, Per Ullberg <[email protected]
> <javascript:;>>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have an oozie workflow that imports data table by table from a
> RDBMS
> > > > using sqoop. One action per table. The sqoop commands use "split by
> > > column"
> > > > and spread out on a number of mappers.
> > > >
> > > > We fork all the actions so basically all sqoop jobs are launched at
> once.
> > > >
> > > > The RDBMS can only accept a fixed number of connections and if this
> is
> > > > exceeded, the sqoop action will fail and eventually the whole oozie
> > > > workflow will fail.
> > > >
> > > > We use the yarn capacity scheduler (2.6.0) and have set up a specific
> > > queue
> > > > for this job to throttle the maximum number of concurrent containers.
> > > > However, this setup is hard to manage because all configurations in
> the
> > > > capacity scheduler are relative to the max amount of vcores of the
> > > cluster
> > > > and as we add machines or otherwise tune the cluster, the actual
> number
> > > of
> > > > containers granted to the oozie job changes and at times we hit the
> > > > connection roof.
> > > >
> > > > So, is there another way to throttle the number of concurrent
> containers
> > > > for an oozie job? I guess you would have to be able to throttle both
> > > > launchers and map-reduce containers?
> > > >
> > > > best regards
> > > > /Pelle
> > > >
> > > >
> > > > --
> > > >
> > > > *Per Ullberg*
> > > > Tech Lead
> > > > Odin - Uppsala
> > > >
> > > > Klarna AB
> > > > Sveavägen 46, 111 34 Stockholm
> > > > Tel: +46 8 120 120 00
> > > > Reg no: 556737-0431
> > > > klarna.com
> > > >
> > >
> >
> >
> >
> > --
> >
> > *Per Ullberg*
> > Tech Lead
> > Odin - Uppsala
> >
> > Klarna AB
> > Sveavägen 46, 111 34 Stockholm
> > Tel: +46 8 120 120 00
> > Reg no: 556737-0431
> > klarna.com
>


-- 

*Per Ullberg*
Tech Lead
Odin - Uppsala

Klarna AB
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00
Reg no: 556737-0431
klarna.com

Reply via email to