DataFrame insertIntoJDBC parallelism while writing data into a DB table

Mohammad Tariq Mon, 15 Jun 2015 10:23:15 -0700

Hello list,

The method *insertIntoJDBC(url: String, table: String, overwrite: Boolean)*
provided by Spark DataFrame allows us to copy a DataFrame into a JDBC DB
table. Similar functionality is provided by the *createJDBCTable(url:
String, table: String, allowExisting: Boolean) *method. But if you look at
the docs it says that *createJDBCTable *runs a *bunch of Insert statements*
in order to copy the data. While the docs of *insertIntoJDBC *doesn't have
any such statement.


Could someone please shed some light on this? How exactly data gets
inserted using *insertIntoJDBC *method?

And if it is same as *createJDBCTable *method, then what exactly does *bunch
of Insert statements* mean? What's the criteria to decide the number
*inserts/bunch*? How are these bunches generated?

*An example* could be creating a DataFrame by reading all the files stored
in a given directory. If I just do *DataFrame.save()*, it'll create the
same number of output files as the input files. What'll happen in case of
*DataFrame.df.insertIntoJDBC()*?

I'm really sorry to be pest of questions, but I could net get much help by
Googling about this.

Thank you so much for your valuable time. really appreciate it.

[image: http://]
Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>

DataFrame insertIntoJDBC parallelism while writing data into a DB table

Reply via email to