Re: Will spark cache table once even if I call read/cache on the same table multiple times

Rabin Banerjee Fri, 18 Nov 2016 11:17:42 -0800

Hi Yong,

  But every time  val tabdf = sqlContext.table(tablename) is called tabdf.rdd
is having a new id which can be checked by calling tabdf.rdd.id .
And,
https://github.com/apache/spark/blob/b6de0c98c70960a97b07615b0b08fbd8f900fbe7/core/src/main/scala/org/apache/spark/SparkContext.scala#L268


Spark is maintaining the Map if [RDD_ID,RDD] , as RDD id is changing , will
spark cache same data again and again ??

For example ,

val tabdf = sqlContext.table("employee")
tabdf.cache()
tabdf.someTransformation.someAction
println(tabledf.rdd.id)
val tabdf1 = sqlContext.table("employee")
tabdf1.cache() <= *Will spark again go to disk read and load data into
memory or look into cache ?*
tabdf1.someTransformation.someAction
println(tabledf1.rdd.id)

Regards,
R Banerjee




On Fri, Nov 18, 2016 at 9:14 PM, Yong Zhang <java8...@hotmail.com> wrote:

> That's correct, as long as you don't change the StorageLevel.
>
>
> https://github.com/apache/spark/blob/master/core/src/
> main/scala/org/apache/spark/rdd/RDD.scala#L166
>
>
>
> Yong
>
> ------------------------------
> *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com>
> *Sent:* Friday, November 18, 2016 10:36 AM
> *To:* user; Mich Talebzadeh; Tathagata Das
> *Subject:* Will spark cache table once even if I call read/cache on the
> same table multiple times
>
> Hi All ,
>
>   I am working in a project where code is divided into multiple reusable
> module . I am not able to understand spark persist/cache on that context.
>
> My Question is Will spark cache table once even if I call read/cache on
> the same table multiple times ??
>
>  Sample Code ::
>
>   TableReader::
>
>    def getTableDF(tablename:String,persist:Boolean = false) : DataFrame =
> {
>          val tabdf = sqlContext.table(tablename)
>          if(persist) {
>              tabdf.cache()
>             }
>       return tableDF
> }
>
>  Now
> Module1::
>  val emp = TableReader.getTable("employee")
>  emp.someTransformation.someAction
>
> Module2::
>  val emp = TableReader.getTable("employee")
>  emp.someTransformation.someAction
>
> ....
>
> ModuleN::
>  val emp = TableReader.getTable("employee")
>  emp.someTransformation.someAction
>
> Will spark cache emp table once , or it will cache every time I am calling
> ?? Shall I maintain a global hashmap to handle that ? something like
> Map[String,DataFrame] ??
>
>  Regards,
> Rabin Banerjee
>
>
>
>

Re: Will spark cache table once even if I call read/cache on the same table multiple times

Reply via email to