Re: Random Sample in Map/Reduce

2012-05-14 Thread Shi Yu
To answer my own question. I applied a non-repeatable random number generator in the mapper. At mapper setup stage I generate a pre-defined number of random numbers, then I use a counter along the mapper. When the counter is contained in the random number set, the Mapper executes and outputs

Random Sample in Map/Reduce

2012-05-14 Thread Shi Yu
Hi, Before I raise this question I searched relevant topics. There are suggestions online: "Mappers: Output all qualifying values, each with a random integer key. Single reducer: Output the first N values, throwing away the keys." However, this schema seems not very efficient when the data