Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait : > I wonder the differences accessing HIVE tables in two different ways: > - with jdbc access > - with sparkContext
Well there is also a third way to access the hive data from spark: - with direct file access (here ORC format) For example: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.setConf("spark.sql.orc.filterPushdown", "true") val people = sqlContext.read.format("orc").load("hdfs://cluster//orc_people") people.createOrReplaceTempView("people") sqlContext.sql("SELECT count(1) FROM people WHERE ...").show() This method looks much faster than both: - with jdbc access - with sparkContext Any experience on that ? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org