my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very quickly, but the job itself does not finish, until about 10 minutes later. this is rather surprising. my input is a sparse vector of 37000 rows, and the column count is 8000, with each row usually having < 10 elements set to non-zero. so the input size is fairly small.
I looked at the Q-job code, it seems rather normal, i.e. it's not doing anything special after the map() function is completed. so I wonder why it's lagging so long after 100% ? here is the syslog from hadoop: 2014-10-09 10:37:40,504 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2014-10-09 10:37:40,538 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2014-10-09 10:37:40,549 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.gz] 2014-10-09 10:39:39,143 WARN [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the stream java.io.IOException: No such process 2014-10-09 10:40:09,117 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.deflate] 2014-10-09 10:46:23,991 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.deflate] 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.deflate] 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.deflate] 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.deflate] 2014-10-09 10:46:31,219 INFO [LeaseRenewer:[email protected]:8020] org.apache.hadoop.ipc.Client: Retrying connect to server: apollo-phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 0 time(s); maxRetries=45 2014-10-09 10:47:45,241 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.deflate] 2014-10-09 10:47:46,571 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1412781120464_7857_m_000000_0 is done. And is in the process of committing 2014-10-09 10:47:46,739 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1412781120464_7857_m_000000_0 is allowed to commit now 2014-10-09 10:47:47,389 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of task 'attempt_1412781120464_7857_m_000000_0' to hdfs://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_000000 2014-10-09 10:47:47,574 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1412781120464_7857_m_000000_0' done. 2014-10-09 10:47:47,575 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system... 2014-10-09 10:47:47,576 INFO [ganglia] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: ganglia thread interrupted. 2014-10-09 10:47:47,576 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped. 2014-10-09 10:47:47,576 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
