Nishant, CDH4+ Flume is built against Hadoop-2, and may not work correctly against Hadoop-1.x, since Hadoop's interfaces changed in the mean time. You could also use Apache Flume-1.2.0 or the upcoming Apache Flume-1.3.0 directly against Hadoop-1.x without issues, as they are built against Hadoop-1.x.
Thanks, Hari -- Hari Shreedharan On Thursday, October 18, 2012 at 1:18 PM, Nishant Neeraj wrote: > I am working on a POC using > > flume-ng version Flume 1.2.0-cdh4.1.1 > > Hadoop 1.0.4 > > > The config looks like this > > #Flume agent configuration > agent1.sources = avroSource1 > agent1.sinks = fileSink1 > agent1.channels = memChannel1 > > agent1.sources.avroSource1.type = avro > agent1.sources.avroSource1.channels = memChannel1 > agent1.sources.avroSource1.bind = 0.0.0.0 > agent1.sources.avroSource1.port = 4545 > > agent1.sources.avroSource1.interceptors = b > agent1.sources.avroSource1.interceptors.b.type = > org.apache.flume.interceptor.TimestampInterceptor$Builder > > agent1.sinks.fileSink1.type = hdfs > agent1.sinks.fileSink1.channel = memChannel1 > agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d > agent1.sinks.fileSink1.hdfs.filePrefix = agg > agent1.sinks.fileSink1.hdfs.rollInterval = 0 > agent1.sinks.fileSink1.hdfs.rollSize = 0 > agent1.sinks.fileSink1.hdfs.rollCount = 0 > agent1.sinks.fileSink1.hdfs.fileType = DataStream > agent1.sinks.fileSink1.hdfs.writeFormat = Text > > > agent1.channels.memChannel1.type = memory > agent1.channels.memChannel1.capacity = 1000 > agent1.channels.memChannel1.transactionCapacity = 1000 > > > > Basically, I do not want to roll the file at all. I am just wanting to tail > and watch the show from Hadoop UI. The problem is it does not work. The > console keeps saying, > > agg.1350590350462.tmp 0 KB 2012-10-18 19:59 > > Flume console shows events getting pushes. When I stop the flume, I see the > file gets populated, but the '.tmp' is still in the file name. And I see this > exception on close. > > 2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG - > org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)] > Closing /flume/agg1/12-10-18/agg.1350590350462.tmp > 2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN - > org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)] > failed to close() HDFSWriter for file > (/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception follows. > java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) > at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667) > at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) > at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103) > at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257) > at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50) > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243) > at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240) > at > org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127) > at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240) > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748) > at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > > > > Thanks > Nishant > >
