Re: HDFS Sink keeps .tmp files and closes with exception

Hari Shreedharan Thu, 18 Oct 2012 16:00:55 -0700

Nishant, 

CDH4+ Flume is built against Hadoop-2, and may not work correctly against 
Hadoop-1.x, since Hadoop's interfaces changed in the mean time. You could also 
use Apache Flume-1.2.0 or the upcoming Apache Flume-1.3.0 directly against 
Hadoop-1.x without issues, as they are built against Hadoop-1.x.



Thanks,
Hari

-- 
Hari Shreedharan


On Thursday, October 18, 2012 at 1:18 PM, Nishant Neeraj wrote:

> I am working on a POC using 
> > flume-ng version Flume 1.2.0-cdh4.1.1
> > Hadoop 1.0.4
> > 
> The config looks like this
> 
> #Flume agent configuration
> agent1.sources = avroSource1
> agent1.sinks = fileSink1
> agent1.channels = memChannel1
> 
> agent1.sources.avroSource1.type = avro 
> agent1.sources.avroSource1.channels = memChannel1
> agent1.sources.avroSource1.bind = 0.0.0.0
> agent1.sources.avroSource1.port = 4545
> 
> agent1.sources.avroSource1.interceptors = b 
> agent1.sources.avroSource1.interceptors.b.type = 
> org.apache.flume.interceptor.TimestampInterceptor$Builder
> 
> agent1.sinks.fileSink1.type = hdfs
> agent1.sinks.fileSink1.channel = memChannel1
> agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d
> agent1.sinks.fileSink1.hdfs.filePrefix = agg
> agent1.sinks.fileSink1.hdfs.rollInterval = 0
> agent1.sinks.fileSink1.hdfs.rollSize = 0
> agent1.sinks.fileSink1.hdfs.rollCount = 0
> agent1.sinks.fileSink1.hdfs.fileType = DataStream
> agent1.sinks.fileSink1.hdfs.writeFormat = Text
> 
> 
> agent1.channels.memChannel1.type = memory 
> agent1.channels.memChannel1.capacity = 1000
> agent1.channels.memChannel1.transactionCapacity = 1000
> 
> 
> 
> Basically, I do not want to roll the file at all. I am just wanting to tail 
> and watch the show from Hadoop UI. The problem is it does not work. The 
> console keeps saying, 
> 
> agg.1350590350462.tmp 0 KB    2012-10-18 19:59
> 
> Flume console shows events getting pushes. When I stop the flume,  I see the 
> file gets populated, but the '.tmp' is still in the file name. And I see this 
> exception on close. 
> 
> 2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG - 
> org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)] 
> Closing /flume/agg1/12-10-18/agg.1350590350462.tmp
> 2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN - 
> org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)] 
> failed to close() HDFSWriter for file 
> (/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception follows.
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
> at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
> at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
> at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257)
> at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240)
> at 
> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
> at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:679)
> 
> 
> 
> Thanks
> Nishant
> 
>

Re: HDFS Sink keeps .tmp files and closes with exception

Reply via email to