Hey Martjin, I don't have any intuition on this one-- is this code that you could post as a gist or something so I could play with it and see if I see anything amiss? The trick will be figuring out if the problem is in Crunch, the underlying DB library, or the config.
J On Mon, Mar 18, 2013 at 6:50 AM, Martijn Lenderink <[email protected]>wrote: > Hello, > > I have a working JDBC-connection to get data from an MSSQL source. > Its all works great except my cluster only opens one connection to the > MSSQL server. > > I have multiple nodes running but the data gets pulled only from one node > and then the data get send to other nodes for processing. > > I'am using code similar to the following: > > https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java > > The only difference is the i'am using the DataDrivenDBInputFormat. > > When i debug the source-code the query gets split into multiple queries > but only get executed on one machine. > Why isn't this executed in parallel with multiple connections to the MSSQL > server? > > Greetings, > Martijn Lenderink > > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
