That's an unfortunate documentation bug in the programming guide... We
failed to update it after making the change.
Cheng
On 2/28/15 8:13 AM, Deborah Siegel wrote:
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
"Note tha
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
"Note that if you call schemaRDD.cache() rather than
sqlContext.cacheTable(...), tables will *not* be cached using the in-memory
columnar format, and therefore sqlContext.cacheTa
>
> From Zhan Zhang's reply, yes I still get the parquet's advantage.
>
You will need to at least use SQL or the DataFrame API (coming in Spark
1.3) to specify the columns that you want in order to get the parquet
benefits. The rest of your operations can be standard Spark.
My next question is,
ory columnar store when cached the table using
cacheTable()?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833p21850.html
Sent from the Apache Spark User List mailing list arc
on be as fast as it would have been if I have used SQL?
>
> Please advice.
>
> Thanks & Regards
> Tridib
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833.
& Regards
Tridib
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-function-on-parquet-without-sql-tp21833.html
Sent from the Apache Spark User List mailing list archive at Nabble