Hi, I am trying to generate random data using hadoop streaming & python. It's a map only job and I need to run a number of maps. There is no input to the map as it's just going to generate random data.
How do I specify the number of maps to run? ( I am confused here because, if I am not wrong, the number of maps spawned is related to the input data size ) Any ideas as to how this can be done? Warm regards, Austin
