I commented out the code about compression, but the actual job console
still shows mapreduce.output.fileoutputformat.compress

as true

On Thu, Oct 9, 2014 at 11:40 AM, Yang <[email protected]> wrote:

> it's possible that  they are compressing the output, I'm now rebuilding
> the code after commenting out the setOutputCompress(true) in the code
>
> also will run with compression param set to false
>
>
> but still it's quite surprising why compression should take so long
> (8--10minutes)
>
> On Thu, Oct 9, 2014 at 11:06 AM, Yang <[email protected]> wrote:
>
>> my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very
>> quickly, but the job itself does not finish, until about 10 minutes later.
>> this is rather surprising. my input is a sparse vector of 37000 rows, and
>> the column count is 8000, with each row usually having < 10 elements set to
>> non-zero. so the input size is fairly small.
>>
>>
>> I looked at the Q-job code, it seems rather normal, i.e. it's not doing
>> anything special after the map() function is completed. so I wonder why
>> it's lagging so long after 100% ?
>>
>>
>> here is the syslog from hadoop:
>>
>>
>>
>> 2014-10-09 10:37:40,504 INFO [main] 
>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & 
>> initialized native-zlib library
>> 2014-10-09 10:37:40,538 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.gz]
>> 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.gz]
>> 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.gz]
>> 2014-10-09 10:37:40,549 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.gz]
>> 2014-10-09 10:39:39,143 WARN [communication thread] 
>> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the stream 
>> java.io.IOException: No such process
>> 2014-10-09 10:40:09,117 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new compressor [.deflate]
>> 2014-10-09 10:46:23,991 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.deflate]
>> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.deflate]
>> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.deflate]
>> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new decompressor [.deflate]
>> 2014-10-09 10:46:31,219 INFO 
>> [LeaseRenewer:[email protected]:8020] 
>> org.apache.hadoop.ipc.Client: Retrying connect to server: 
>> apollo-phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 0 time(s); 
>> maxRetries=45
>> 2014-10-09 10:47:45,241 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
>> Got brand-new compressor [.deflate]
>> 2014-10-09 10:47:46,571 INFO [main] org.apache.hadoop.mapred.Task: 
>> Task:attempt_1412781120464_7857_m_000000_0 is done. And is in the process of 
>> committing
>> 2014-10-09 10:47:46,739 INFO [main] org.apache.hadoop.mapred.Task: Task 
>> attempt_1412781120464_7857_m_000000_0 is allowed to commit now
>> 2014-10-09 10:47:47,389 INFO [main] 
>> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of 
>> task 'attempt_1412781120464_7857_m_000000_0' to 
>> hdfs://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_000000
>> 2014-10-09 
>> <http://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_0000002014-10-09>
>>  10:47:47,574 INFO [main] org.apache.hadoop.mapred.Task: Task 
>> 'attempt_1412781120464_7857_m_000000_0' done.
>> 2014-10-09 10:47:47,575 INFO [main] 
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics 
>> system...
>> 2014-10-09 10:47:47,576 INFO [ganglia] 
>> org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: ganglia thread 
>> interrupted.
>> 2014-10-09 10:47:47,576 INFO [main] 
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
>> stopped.
>> 2014-10-09 10:47:47,576 INFO [main] 
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
>> shutdown complete.
>>
>>
>

Reply via email to