On Tue, Jun 8, 2010 at 9:19 PM, deneche abdelhakim <[email protected]>wrote:
> mapred.max.split.size controls how many partitions will be generated from > the data. > the current implementation of random forest is pretty memory intensive, and > because all the work is done in the mappers' close method, when the data is > Big, Hadoop just thinks that the mappers have failed (I will solve this > problem some day). > By periodically hitting Reporter.progress() in the long-lived mapper, this typically fixes this. -jake
