To answer my own question. I applied a non-repeatable random
number generator in the mapper. At mapper setup stage I generate
a pre-defined number of random numbers, then I use a counter
along the mapper. When the counter is contained in the random
number set, the Mapper executes and outputs
Hi,
Before I raise this question I searched relevant topics. There
are suggestions online:
"Mappers: Output all qualifying values, each with a random
integer key.
Single reducer: Output the first N values, throwing away the
keys."
However, this schema seems not very efficient when the data