Parallel read parquet file, write to postgresql

James Starks Mon, 03 Dec 2018 05:41:02 -0800

Reading Spark doc 
(https://spark.apache.org/docs/latest/sql-data-sources-parquet.html). It's not 
mentioned how to parallel read parquet file with SparkSession. Would 
--num-executors just work? Any additional parameters needed to be added to 
SparkSession as well?


Also if I want to parallel write data to database, would options 
'numPartitions' and 'batchsize' enough to improve write performance? For 
example,

                 mydf.format("jdbc").
                     option("driver", "org.postgresql.Driver").
                     option("url", url).
                     option("dbtable", table_name).
                     option("user", username).
                     option("password", password).
                     option("numPartitions", N) .
                     option("batchsize", M)
                     save

From Spark website 
(https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#jdbc-to-other-databases),
 I only find these two parameters that would have impact  on db write 
performance.

I appreciate any suggestions.

Parallel read parquet file, write to postgresql

Reply via email to