Hi Yong, But every time val tabdf = sqlContext.table(tablename) is called tabdf.rdd is having a new id which can be checked by calling tabdf.rdd.id . And, https://github.com/apache/spark/blob/b6de0c98c70960a97b07615b0b08fbd8f900fbe7/core/src/main/scala/org/apache/spark/SparkContext.scala#L268
Spark is maintaining the Map if [RDD_ID,RDD] , as RDD id is changing , will spark cache same data again and again ?? For example , val tabdf = sqlContext.table("employee") tabdf.cache() tabdf.someTransformation.someAction println(tabledf.rdd.id) val tabdf1 = sqlContext.table("employee") tabdf1.cache() <= *Will spark again go to disk read and load data into memory or look into cache ?* tabdf1.someTransformation.someAction println(tabledf1.rdd.id) Regards, R Banerjee On Fri, Nov 18, 2016 at 9:14 PM, Yong Zhang <java8...@hotmail.com> wrote: > That's correct, as long as you don't change the StorageLevel. > > > https://github.com/apache/spark/blob/master/core/src/ > main/scala/org/apache/spark/rdd/RDD.scala#L166 > > > > Yong > > ------------------------------ > *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com> > *Sent:* Friday, November 18, 2016 10:36 AM > *To:* user; Mich Talebzadeh; Tathagata Das > *Subject:* Will spark cache table once even if I call read/cache on the > same table multiple times > > Hi All , > > I am working in a project where code is divided into multiple reusable > module . I am not able to understand spark persist/cache on that context. > > My Question is Will spark cache table once even if I call read/cache on > the same table multiple times ?? > > Sample Code :: > > TableReader:: > > def getTableDF(tablename:String,persist:Boolean = false) : DataFrame = > { > val tabdf = sqlContext.table(tablename) > if(persist) { > tabdf.cache() > } > return tableDF > } > > Now > Module1:: > val emp = TableReader.getTable("employee") > emp.someTransformation.someAction > > Module2:: > val emp = TableReader.getTable("employee") > emp.someTransformation.someAction > > .... > > ModuleN:: > val emp = TableReader.getTable("employee") > emp.someTransformation.someAction > > Will spark cache emp table once , or it will cache every time I am calling > ?? Shall I maintain a global hashmap to handle that ? something like > Map[String,DataFrame] ?? > > Regards, > Rabin Banerjee > > > >