Re: Hive From Spark: Jdbc VS sparkContext

Gourav Sengupta Sun, 15 Oct 2017 06:35:40 -0700

Hi Nicolas,

what if the table has partitions and sub-partitions? And you do not want to
access the entire data?



Regards,
Gourav

On Sun, Oct 15, 2017 at 12:55 PM, Nicolas Paris <nipari...@gmail.com> wrote:

> Le 03 oct. 2017 à 20:08, Nicolas Paris écrivait :
> > I wonder the differences accessing HIVE tables in two different ways:
> > - with jdbc access
> > - with sparkContext
>
> Well there is also a third way to access the hive data from spark:
> - with direct file access (here ORC format)
>
>
> For example:
>
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> val people = sqlContext.read.format("orc").load("hdfs://cluster//orc_
> people")
> people.createOrReplaceTempView("people")
> sqlContext.sql("SELECT count(1) FROM people WHERE ...").show()
>
>
> This method looks much faster than both:
> - with jdbc access
> - with sparkContext
>
> Any experience on that ?
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Hive From Spark: Jdbc VS sparkContext

Reply via email to