Partitioning in spark while reading from RDBMS via JDBC

Devender Yadav Fri, 31 Mar 2017 15:53:32 -0700

Hi All,


I am running spark in cluster mode and reading data from RDBMS via JDBC.

As per spark 
docs<http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases>,
 these partitioning parameters describe how to partition the table when reading 
in parallel from multiple workers:

partitionColumn,
lowerBound,
upperBound,
numPartitions


These are optional parameters.

What would happen if I don't specify these:

  *   Only 1 worker read the whole data?
  *   If it still reads parallelly, how does it partition data?



Regards,
Devender

________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Partitioning in spark while reading from RDBMS via JDBC

Reply via email to