Question on Spark SQL for a directory

Ron Gonzalez Tue, 21 Jul 2015 16:07:42 -0700

Hi,
  Question on using spark sql.

Can someone give an example for creating table from a directorycontaining parquet files in HDFS instead of an actual parquet file?


Thanks,
Ron

On 07/21/2015 01:59 PM, Brandon White wrote:

A few questions about caching a table in Spark SQL.

1) Is there any difference between caching the dataframe and the table?

df.cache() vs sqlContext.cacheTable("tableName")
2) Do you need to "warm up" the cache before seeing the performancebenefits? Is the cache LRU? Do you need to run some queries on thetable before it is cached in memory?
3) Is caching the table much faster than .saveAsTable? I am onlyseeing a 10 %- 20% performance increase.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Question on Spark SQL for a directory

Reply via email to