Re: JDBC parallel

Martijn Lenderink Tue, 19 Mar 2013 00:50:13 -0700

Hello,

Thanks for the response.
I found out the problem was an error in my YARN config, not in Crunch.
Its fixed now.


Greetings,
Martijn Lenderink

2013/3/18 Matthias Friedrich <[email protected]>

> Hi,
>
> IIRC, the code in Crunch is inherently sequential and meant for
> small(ish) amounts of data. After all, distributed read with Hadoop
> from a RDBMS is often considered a DDoS attack :)
>
> Regards,
>   Matthias
>
> On Monday, 2013-03-18, Josh Wills wrote:
> > Hey Martjin,
> >
> > I don't have any intuition on this one-- is this code that you could post
> > as a gist or something so I could play with it and see if I see anything
> > amiss? The trick will be figuring out if the problem is in Crunch, the
> > underlying DB library, or the config.
> >
> > J
> >
> >
> > On Mon, Mar 18, 2013 at 6:50 AM, Martijn Lenderink
> > <[email protected]>wrote:
> >
> > > Hello,
> > >
> > > I have a working JDBC-connection to get data from an MSSQL source.
> > > Its all works great except my cluster only opens one connection to the
> > > MSSQL server.
> > >
> > > I have multiple nodes running but the data gets pulled only from one
> node
> > > and then the data get send to other nodes for processing.
> > >
> > > I'am using code similar to the following:
> > >
> > >
> https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java
> > >
> > > The only difference is the i'am using the DataDrivenDBInputFormat.
> > >
> > > When i debug the source-code the query gets split into multiple queries
> > > but only get executed on one machine.
> > > Why isn't this executed in parallel with multiple connections to the
> MSSQL
> > > server?
> > >
> > > Greetings,
> > > Martijn Lenderink
> > >
> > >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 

Met vriendelijke groet,
Martijn Lenderink

Re: JDBC parallel

Reply via email to