RE: Possible long lineage issue when using DStream to update a normal RDD

2015-05-08 Thread Shao, Saisai
I think you could use checkpoint to cut the lineage of `MyRDD`, I have a similar scenario and I use checkpoint to workaround this problem :) Thanks Jerry -Original Message- From: yaochunnan [mailto:yaochun...@gmail.com] Sent: Friday, May 8, 2015 1:57 PM To: user@spark.apache.org

Re: Possible long lineage issue when using DStream to update a normal RDD

2015-05-08 Thread Chunnan Yao
Thank you for this suggestion! But may I ask what's the advantage to use checkpoint instead of cache here? Cuz they both cut lineage. I only know checkpoint saves RDD in disk, while cache in memory. So may be it's for reliability? Also on

RE: Possible long lineage issue when using DStream to update a normal RDD

2015-05-08 Thread Shao, Saisai
...@gmail.com] Sent: Friday, May 8, 2015 2:51 PM To: Shao, Saisai Cc: user@spark.apache.org Subject: Re: Possible long lineage issue when using DStream to update a normal RDD Thank you for this suggestion! But may I ask what's the advantage to use checkpoint instead of cache here? Cuz they both cut lineage