Hi,

I have a HIVE table with few thousand partitions (based on date and time).
It takes a long time to run if for the first time and then subsequently it
is fast.

Is there a way to store the cache of partition lookups so that every time I
start a new SPARK instance (cannot keep my personal server running
continuously), I can immediately restore back the temptable in hiveContext
without asking it go again and cache the partition lookups?

Currently it takes around 1.5 hours for me just to cache in the partition
information and after that I can see that the job gets queued in the SPARK
UI.

Regards,
Gourav

Reply via email to