Hello, Thanks for the response. I found out the problem was an error in my YARN config, not in Crunch. Its fixed now.
Greetings, Martijn Lenderink 2013/3/18 Matthias Friedrich <[email protected]> > Hi, > > IIRC, the code in Crunch is inherently sequential and meant for > small(ish) amounts of data. After all, distributed read with Hadoop > from a RDBMS is often considered a DDoS attack :) > > Regards, > Matthias > > On Monday, 2013-03-18, Josh Wills wrote: > > Hey Martjin, > > > > I don't have any intuition on this one-- is this code that you could post > > as a gist or something so I could play with it and see if I see anything > > amiss? The trick will be figuring out if the problem is in Crunch, the > > underlying DB library, or the config. > > > > J > > > > > > On Mon, Mar 18, 2013 at 6:50 AM, Martijn Lenderink > > <[email protected]>wrote: > > > > > Hello, > > > > > > I have a working JDBC-connection to get data from an MSSQL source. > > > Its all works great except my cluster only opens one connection to the > > > MSSQL server. > > > > > > I have multiple nodes running but the data gets pulled only from one > node > > > and then the data get send to other nodes for processing. > > > > > > I'am using code similar to the following: > > > > > > > https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java > > > > > > The only difference is the i'am using the DataDrivenDBInputFormat. > > > > > > When i debug the source-code the query gets split into multiple queries > > > but only get executed on one machine. > > > Why isn't this executed in parallel with multiple connections to the > MSSQL > > > server? > > > > > > Greetings, > > > Martijn Lenderink > > > > > > > > > > > > -- > > Director of Data Science > > Cloudera <http://www.cloudera.com> > > Twitter: @josh_wills <http://twitter.com/josh_wills> > -- Met vriendelijke groet, Martijn Lenderink
