I commented out the code about compression, but the actual job console still shows mapreduce.output.fileoutputformat.compress
as true On Thu, Oct 9, 2014 at 11:40 AM, Yang <[email protected]> wrote: > it's possible that they are compressing the output, I'm now rebuilding > the code after commenting out the setOutputCompress(true) in the code > > also will run with compression param set to false > > > but still it's quite surprising why compression should take so long > (8--10minutes) > > On Thu, Oct 9, 2014 at 11:06 AM, Yang <[email protected]> wrote: > >> my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very >> quickly, but the job itself does not finish, until about 10 minutes later. >> this is rather surprising. my input is a sparse vector of 37000 rows, and >> the column count is 8000, with each row usually having < 10 elements set to >> non-zero. so the input size is fairly small. >> >> >> I looked at the Q-job code, it seems rather normal, i.e. it's not doing >> anything special after the map() function is completed. so I wonder why >> it's lagging so long after 100% ? >> >> >> here is the syslog from hadoop: >> >> >> >> 2014-10-09 10:37:40,504 INFO [main] >> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & >> initialized native-zlib library >> 2014-10-09 10:37:40,538 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.gz] >> 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.gz] >> 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.gz] >> 2014-10-09 10:37:40,549 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.gz] >> 2014-10-09 10:39:39,143 WARN [communication thread] >> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the stream >> java.io.IOException: No such process >> 2014-10-09 10:40:09,117 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new compressor [.deflate] >> 2014-10-09 10:46:23,991 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.deflate] >> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.deflate] >> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.deflate] >> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new decompressor [.deflate] >> 2014-10-09 10:46:31,219 INFO >> [LeaseRenewer:[email protected]:8020] >> org.apache.hadoop.ipc.Client: Retrying connect to server: >> apollo-phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 0 time(s); >> maxRetries=45 >> 2014-10-09 10:47:45,241 INFO [main] org.apache.hadoop.io.compress.CodecPool: >> Got brand-new compressor [.deflate] >> 2014-10-09 10:47:46,571 INFO [main] org.apache.hadoop.mapred.Task: >> Task:attempt_1412781120464_7857_m_000000_0 is done. And is in the process of >> committing >> 2014-10-09 10:47:46,739 INFO [main] org.apache.hadoop.mapred.Task: Task >> attempt_1412781120464_7857_m_000000_0 is allowed to commit now >> 2014-10-09 10:47:47,389 INFO [main] >> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of >> task 'attempt_1412781120464_7857_m_000000_0' to >> hdfs://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_000000 >> 2014-10-09 >> <http://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_0000002014-10-09> >> 10:47:47,574 INFO [main] org.apache.hadoop.mapred.Task: Task >> 'attempt_1412781120464_7857_m_000000_0' done. >> 2014-10-09 10:47:47,575 INFO [main] >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics >> system... >> 2014-10-09 10:47:47,576 INFO [ganglia] >> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: ganglia thread >> interrupted. >> 2014-10-09 10:47:47,576 INFO [main] >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system >> stopped. >> 2014-10-09 10:47:47,576 INFO [main] >> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system >> shutdown complete. >> >> >
