Spark sql insert hive table which method has the highest performance

车 �� Tue, 14 May 2019 23:26:31 -0700

Hello guys,

I use spark streaming to receive data from kafka and need to store the data 
into hive. I see the following ways to insert data into hive on the Internet:


1.use tmp_table
TmpDF=spark.createDataFrame(RDD,schema)
                        TmpDF.createOrReplaceTempView('TmpData')
                        sqlContext.sql('insert overwrite table tmp_table select 
*from TmpData')

2.use DataFrameWriter.insertInto

3.use DataFrameWriter.saveAsTable

I didn't find too many examples, and I don't know if there is any difference 
between them or there is a better way to write into hive. Please give me some 
help.

Thank you

Spark sql insert hive table which method has the highest performance

Reply via email to