Github user ferdonline commented on the issue:

    https://github.com/apache/spark/pull/9428
  
    Hello. I find this feature to be really important and I would be happy to 
contribute here. Even though we would potentially not support every use case, 
it would already be great if in the majority of cases we could avoid the double 
computation, while in other cases we raise a warning saying that computation is 
gonna happen twice.
    
    This is specially important for a use case I have where a transformation 
creates random numbers, so I simply cant recompute things as results will be 
different. So in my case the only option to break lineage seems to be a full 
write() followed by read().
    Any plans to have it in eager checkpoints at least?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to