Hello, I am importing a 40+ billion row table which I exported several months ago. The data size is close to 18TB on hdfs (3x replication).
My problem is when I try to import it with mapreduce it takes a few days -- which is ok -- however when the job fails to whatever reason, I have to restart everything. Is it possible to import the table in chunks like, import 1/3, 2/3, and then finally 3/3 of the table? Btw, the jobs creates close to 150k mapper jobs, thats a problem waiting to happen :-) -- --- Get your facts first, then you can distort them as you please.--
