Hello, I have a working JDBC-connection to get data from an MSSQL source. Its all works great except my cluster only opens one connection to the MSSQL server.
I have multiple nodes running but the data gets pulled only from one node and then the data get send to other nodes for processing. I'am using code similar to the following: https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java The only difference is the i'am using the DataDrivenDBInputFormat. When i debug the source-code the query gets split into multiple queries but only get executed on one machine. Why isn't this executed in parallel with multiple connections to the MSSQL server? Greetings, Martijn Lenderink
