I want to create a large dateset for HBase with different versions and number of rows. It's about 10M rows and 100 versions to do some benchmarks.
What's the fastest way to create it?? I'm generating the dataset with a Mapreduce of 100.000rows and 10verions. It takes 17minutes and size around 7Gb. I don't know if I could do it quickly. The bottleneck is when MapReduces write the output and when transfer the output to the Reduces.
