Oh... I forgot the Crunch is only an abstract for MapReduce pipeline. But anyone tried use it with S3 job output ? It's strange, seems the job froze after write the _SUCESS output to S3. The last log appeared in my job log file is like below:
2016-09-22 10:05:37,194 INFO org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob (Thread-5): Job status available at: http://ip-172-31-103-28.cn-north-1.compute.internal:20888/proxy/application_1472715051930_0002/ 2016-09-22 10:12:13,692 INFO com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream (Thread-5): close closed:false s3://mgtv-ott-data-archive/vodstat-output/ov/year=2016/month=09/day=21/_SUCCESS 2016-09-22 1:09 GMT+08:00 Josh Wills <[email protected]>: > I don't follow- Hadoop handles compression transparently for most of the > commonly used input formats and compression schemes; you shouldn't have to > do anything. > > On Wed, Sep 21, 2016 at 12:53 AM wu lihu <[email protected]> wrote: >> >> Hi Everyone >> I want to ask one question about process the logs files end with >> compressed files ? Is there any example for that ?
