Hi, I want to cache map/reducer temporary output files so that I can compare two map results coming from two different nodes to verify the integrity check.
I am simulating this use case with speculative execution by rescheduling the first task as soon as it is started and running. Now I want to compare output files coming from speculative attempt and prior attempt so that I can calculate the credit scoring of each node. I want to use DistributedCache to cache the local file system files in CommitPending stage from TaskImpl. But the DistributedCache is actually deprecated. is there any other way I can do this ? I think I can use HDFS to save the temporary output files so that other nodes can see it ? but is there any in-memory solution I can use ? any pointers are greatly appreciated. thx & rgds, srinivas chamarthi
