Spark keeps job in memory by default for kind of performance gains you are
seeing. Additionally depending on your query spark runs stages and any
point of time spark's code behind the scene may issue explicit cache. If
you hit any such scenario you will find those cached objects in UI under
storage. Note if caching is done by spark it may be transient.
On 28 Apr 2015 08:00, "Wenlei Xie" <wenlei....@gmail.com> wrote:

> Hi,
>
> I am trying to answer a simple query with SparkSQL over the Parquet file.
> When execute the query several times, the first run will take about 2s
> while the later run will take <0.1s.
>
> By looking at the log file it seems the later runs doesn't load the data
> from disk. However, I didn't enable any cache explicitly. Is there any
> automatic cache used by SparkSQL? Is there anyway to check this?
>
> Thank you?
>
> Best,
> Wenlei
>
>
>

Reply via email to