Hello,         I am bulk loading a set of files (about 400MB each) with "|" as 
the delimiter using ImportTsv. It takes a long time for the 'map' job to 
complete on both a 4 node and a 16 node cluster. I tried the option to generate 
the output (providing -Dimporttsv.bulk.output) which took time indicating that 
the generation of the output files needs improvement.
I am seeing about 8000 rows / sec for this dataset, the 400MB ingestion takes 
about 5-6 mins. How can I improve this? Is there an alternate tool I can use?
ThanksRama                                        

Reply via email to