Hi, I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My questionis that in one of the jobs, map and reduce tasks show 100% finished in about 1m 30s, but I have to wait another 5m for this job to finish. This job writes about 720mb compressed data to HDFS with replication factor 1, in sequence file format. I've tried copying these data to hdfs, it takes only < 20 seconds. What happened during this 5 more minutes?
Any idea on how to optimize this part? Thanks. -- *JU Han* UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888
