Hi, > impact of an executor dying after a localCheckpoint is taken.
My memory is a bit vague on this, but I'd not be surprised if this localCheckpoint-ed RDD would be "broken" and any actions would simply throw an exception like missing partitions or similar. There's no way back. I wish myself that someone with more skills in this area chimed in... Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski> On Wed, Jan 6, 2021 at 8:30 PM Brett Larson <brettpatricklar...@gmail.com> wrote: > Jacek, > Thanks for your response, I am still trying to understand the impact of an > executor dying after a localCheckpoint is taken. > > Would the entire spark application fail in this case due to the broken > lineage? Or would the jobs associated with that executor need to be > re-computed from scratch? > > Thank you! > > > On Wed, Jan 6, 2021 at 1:09 PM Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> > My understanding is that .localCheckpoint() breaks the lineage of the >> RDD >> >> True. >> >> > and this requires that the entire RDD to be rebuild instead of being >> able to recompute lost partitions. >> >> In a sense, it's as if you saved the partitions to executors and re-read >> them back as source data (for this checkpointed RDD). >> >> > Does each executor store a copy of the entire RDD? >> >> No. An executor has got only the data of the partitions (for the tasks >> this executor has executed). >> >> > Checkpoint over .localCheckpoint. >> >> checkpoint is similar to localCheckpoint, but slower and reliable (as >> it's on a stable HDFS file system not on an ephemeral executor). In either >> case, the lineage should be the same = cut. >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://about.me/JacekLaskowski >> "The Internals Of" Online Books <https://books.japila.pl/> >> Follow me on https://twitter.com/jaceklaskowski >> >> <https://twitter.com/jaceklaskowski> >> >> >> On Wed, Jan 6, 2021 at 6:15 PM brettplarson <brettpatricklar...@gmail.com> >> wrote: >> >>> Hello, >>> I am wondering what the impact of using .localCheckpoint() and having the >>> executor die would be? >>> >>> My understanding is that .localCheckpoint() breaks the lineage of the RDD >>> and this requires that the entire RDD to be rebuild instead of being >>> able to >>> recompute lost partitions. >>> >>> Does each executor store a copy of the entire RDD? >>> >>> It's unclear to me the benefit of using Checkpoint over >>> .localCheckpoint. (I >>> am aware that this is HDFS backed, but it's unclear the implications of >>> this) >>> >>> Please let me know, >>> Thank you! >>> >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> > > -- > *Brett Larson * > brettpatricklar...@gmail.com / 847321200 >