It sounds like your underlying data set is in the OS page cache.  If you
want to do test that does it purely from disk do this on each node before
you re-cache the same table:

echo 3 > /proc/sys/vm/drop_caches



On Tue, Feb 4, 2014 at 7:44 AM, Mskh <[email protected]> wrote:

> Hi,
>
> When I cache a table in memory for the first time in Spark (version 0.8.0),
> it usually takes 10 mins. If I were to quit Spark and restart it then
> re-cache the same table in memory, the operation would take 4 mins. I had
> the assumption that quitting the Spark session will un-cache the table from
> memory. Does any OS caching take place since re-caching the table takes
> half
> the original time?
>
> Thanks
> Mskh
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-disk-cache-tp1180.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>



-- 

Woody Christy
Solutions Architect | Partner Engineering | Cloudera Inc
@woodychristy

Reply via email to