Hello,

I have a working JDBC-connection to get data from an MSSQL source.
Its all works great except my cluster only opens one connection to the
MSSQL server.

I have multiple nodes running but the data gets pulled only from one node
and then the data get send to other nodes for processing.

I'am using code similar to the following:
https://github.com/apache/incubator-crunch/blob/master/crunch-contrib/src/it/java/org/apache/crunch/contrib/io/jdbc/DataBaseSourceIT.java

The only difference is the i'am using the DataDrivenDBInputFormat.

When i debug the source-code the query gets split into multiple queries but
only get executed on one machine.
Why isn't this executed in parallel with multiple connections to the MSSQL
server?

Greetings,
Martijn Lenderink

Reply via email to