Hi All: We already know that Spark utilizes the lineage to recompute the RDDs when failure occurs. I want to study the performance of this fault-tolerant approach and have some questions about it.
1) Is there any benchmark (or standard failure model) to test the fault tolerance of these kinds of in-memory data processing systems? 2) How do you emulate the failures in testing spark? (e.g., kill a computation task? or kill the computation nodes?) Thanks!!! -- *Regards,* *Zhaojie*