I meant using |saveAsParquetFile|. As for partition number, you can
always control it with |spark.sql.shuffle.partitions| property.
Cheng
On 2/23/15 1:38 PM, nitin wrote:
I believe calling processedSchemaRdd.persist(DISK) and
processedSchemaRdd.checkpoint() only persists data and I will lose
How about persisting the computed result table first before caching it?
So that you only need to cache the result table after restarting your
service without recomputing it. Somewhat like checkpointing.
Cheng
On 2/22/15 12:55 AM, nitin wrote:
Hi All,
I intend to build a long running spark