The number of part files depends on the number of reduce tasks which can be tuned. So if you have a small problem you run with only one reduce task, if it is a big problem you can run a second job with map and reduce operators only emitten the input key value pairs and set the number of reduce tasks to the number of files you'd like to have.
Regards, D 2014-12-28 13:50 GMT+01:00 tai khuu <[email protected]>: > Hi, I would like to combine mapreduce part files into a single file, is > there any good solution for this? currently I'm going through the file list > and combine them 1 by 1 in 1 thread but I have some concerns about the > performance. I think if data volume is big enough my current solution will > yield very bad performance. >
