Github user ferdonline commented on the issue:
https://github.com/apache/spark/pull/9428
Hello. I find this feature to be really important and I would be happy to
contribute here. Even though we would potentially not support every use case,
it would already be great if in the majority of cases we could avoid the double
computation, while in other cases we raise a warning saying that computation is
gonna happen twice.
This is specially important for a use case I have where a transformation
creates random numbers, so I simply cant recompute things as results will be
different. So in my case the only option to break lineage seems to be a full
write() followed by read().
Any plans to have it in eager checkpoints at least?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]