Hi Weichen, Thank you for the reply.
My understanding was Dataframe API is using the old RDD implementation under the covers though it presents a different API. And calling df.rdd will simply give access to the underlying RDD. Is this assumption wrong? I would appreciate if you can shed more insights on this issue or point me to documentation where I can learn them. Thank you in advance. On Fri, Oct 13, 2017 at 3:19 AM, Weichen Xu <weichen...@databricks.com> wrote: > You should use `df.cache()` > `df.rdd.cache()` won't work, because `df.rdd` generate a new RDD from the > original `df`. and then cache the new RDD. > > On Fri, Oct 13, 2017 at 3:35 PM, Supun Nakandala < > supun.nakand...@gmail.com> wrote: > >> Hi all, >> >> I have been experimenting with cache/persist/unpersist methods with >> respect to both Dataframes and RDD APIs. However, I am experiencing >> different behaviors Ddataframe API compared RDD API such Dataframes are not >> getting cached when count() is called. >> >> Is there a difference between how these operations act wrt to Dataframe >> and RDD APIs? >> >> Thank You. >> -Supun >> > >