Re: How to deal with the log files end with gz compressed

wu lihu Thu, 22 Sep 2016 06:04:49 -0700

Oh... I forgot the Crunch is only an abstract for MapReduce pipeline.
But anyone tried use it with S3 job output ?  It's strange, seems the
job froze after write the _SUCESS output to S3. The last log appeared
in my job log file is like below:


2016-09-22 10:05:37,194 INFO
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob
(Thread-5): Job status available at:
http://ip-172-31-103-28.cn-north-1.compute.internal:20888/proxy/application_1472715051930_0002/

2016-09-22 10:12:13,692 INFO
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream
(Thread-5): close closed:false
s3://mgtv-ott-data-archive/vodstat-output/ov/year=2016/month=09/day=21/_SUCCESS

2016-09-22 1:09 GMT+08:00 Josh Wills <[email protected]>:
> I don't follow- Hadoop handles compression transparently for most of the
> commonly used input formats and compression schemes; you shouldn't have to
> do anything.
>
> On Wed, Sep 21, 2016 at 12:53 AM wu lihu <[email protected]> wrote:
>>
>> Hi Everyone
>>   I want to ask one question about process the logs files end with
>> compressed files ? Is there any example for that ?

Re: How to deal with the log files end with gz compressed

Reply via email to