A quick thought I had on this was to emit every record twice. Once with a sensible key and once with the hash of that key.
Then in the reducer buffer up randomized or straight records as needed. This doesn't resample each variable independently but it should work just about as well. On Wed, Jun 12, 2013 at 6:27 PM, qiaoresearcher <[email protected]>wrote: > Current mahout does not have variable importance in random forest. > > Variable importance, especially the permutation one, it is trivial to > implement locally. > > but how to do it with mapreduce? mapper will only have one record each > time, but the permutation needs to be done on the whole samples of one > atrribute, how can we do it in the way of mapreduce? >
