A shared connection pool in sqoop would do the trick for me, but looking through that codebase it does not look like it has a pluggable connection pool API. Maybe a connection proxy, but that assumes sqoop actions could wait infinitely for a connection. I have not delved further on that subject yet.
/Pelle On Wed, Jun 1, 2016 at 11:31 AM, Harsh J <[email protected]> wrote: > Given each job is an independent application on YARN, there's no way to do > that outside of a Scheduler level config. > > On Wed, 1 Jun 2016 at 14:49 Per Ullberg <[email protected]> wrote: > > > If I understand that feature correctly it would only limit one sqoop job > to > > a certain number of mappers. I want to cap multiple concurrent sqoop jobs > > to a total number of mappers. > > > > regards > > /Pelle > > > > On Fri, May 27, 2016 at 8:08 AM, Harsh J <[email protected]> wrote: > > > > > Perhaps the feature of > > > https://issues.apache.org/jira/browse/MAPREDUCE-5583 is > > > what you are looking for. > > > > > > On Fri, 27 May 2016 at 00:04 Per Ullberg <[email protected]> > wrote: > > > > > > > The fair scheduler would solve this issue, but we need the capacity > > > > scheduler for other reason. Would it be possible to run multiple > > > schedulers > > > > in parallel? > > > > > > > > /Pelle > > > > > > > > On Thursday, May 26, 2016, David Morel <[email protected]> wrote: > > > > > > > > > Le 26 mai 2016 9:04 AM, "Per Ullberg" <[email protected] > > > > > <javascript:;>> a écrit : > > > > > > > > > > > > The split is skewed. Just running one sqoop action will cause > some > > > > > > containers to finish early and others to finish late. If we run > the > > > > > actions > > > > > > concurrently, the early finishers will be idle until all > containers > > > for > > > > > > that action is done and the next action can commence. By running > > the > > > > > > actions in parallel, we will finish earlier in total and also > > utilize > > > > our > > > > > > cluster resources better. > > > > > > > > > > I used the FairScheduler for exactly this scenario at my previous > > job. > > > > > > > > > > David > > > > > > > > > > > regards > > > > > > /Pelle > > > > > > > > > > > > On Thu, May 26, 2016 at 3:09 AM, Robert Kanter < > > [email protected] > > > > > <javascript:;>> > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > If you want to only run one of the Sqoop Actions at a time, why > > not > > > > > simply > > > > > > > remove the fork and run the Sqoop Actions sequentially? > > > > > > > > > > > > > > - Robert > > > > > > > > > > > > > > On Tue, May 3, 2016 at 12:15 AM, Per Ullberg < > > > [email protected] > > > > > <javascript:;>> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > We have an oozie workflow that imports data table by table > > from a > > > > > RDBMS > > > > > > > > using sqoop. One action per table. The sqoop commands use > > "split > > > by > > > > > > > column" > > > > > > > > and spread out on a number of mappers. > > > > > > > > > > > > > > > > We fork all the actions so basically all sqoop jobs are > > launched > > > at > > > > > once. > > > > > > > > > > > > > > > > The RDBMS can only accept a fixed number of connections and > if > > > this > > > > > is > > > > > > > > exceeded, the sqoop action will fail and eventually the whole > > > oozie > > > > > > > > workflow will fail. > > > > > > > > > > > > > > > > We use the yarn capacity scheduler (2.6.0) and have set up a > > > > specific > > > > > > > queue > > > > > > > > for this job to throttle the maximum number of concurrent > > > > containers. > > > > > > > > However, this setup is hard to manage because all > > configurations > > > in > > > > > the > > > > > > > > capacity scheduler are relative to the max amount of vcores > of > > > the > > > > > > > cluster > > > > > > > > and as we add machines or otherwise tune the cluster, the > > actual > > > > > number > > > > > > > of > > > > > > > > containers granted to the oozie job changes and at times we > hit > > > the > > > > > > > > connection roof. > > > > > > > > > > > > > > > > So, is there another way to throttle the number of concurrent > > > > > containers > > > > > > > > for an oozie job? I guess you would have to be able to > throttle > > > > both > > > > > > > > launchers and map-reduce containers? > > > > > > > > > > > > > > > > best regards > > > > > > > > /Pelle > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > *Per Ullberg* > > > > > > > > Tech Lead > > > > > > > > Odin - Uppsala > > > > > > > > > > > > > > > > Klarna AB > > > > > > > > Sveavägen 46, 111 34 Stockholm > > > > > > > > Tel: +46 8 120 120 00 > > > > > > > > Reg no: 556737-0431 > > > > > > > > klarna.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > *Per Ullberg* > > > > > > Tech Lead > > > > > > Odin - Uppsala > > > > > > > > > > > > Klarna AB > > > > > > Sveavägen 46, 111 34 Stockholm > > > > > > Tel: +46 8 120 120 00 > > > > > > Reg no: 556737-0431 > > > > > > klarna.com > > > > > > > > > > > > > > > > > -- > > > > > > > > *Per Ullberg* > > > > Tech Lead > > > > Odin - Uppsala > > > > > > > > Klarna AB > > > > Sveavägen 46, 111 34 Stockholm > > > > Tel: +46 8 120 120 00 > > > > Reg no: 556737-0431 > > > > klarna.com > > > > > > > > > > > > > > > -- > > > > *Per Ullberg* > > Tech Lead > > Odin - Uppsala > > > > Klarna AB > > Sveavägen 46, 111 34 Stockholm > > Tel: +46 8 120 120 00 > > Reg no: 556737-0431 > > klarna.com > > > -- *Per Ullberg* Tech Lead Odin - Uppsala Klarna AB Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 Reg no: 556737-0431 klarna.com
