Hi, That's just the file channel. The HDFSEventSink will need a heck of a lot more than the just those two jars. To override the version of hadoop it will find from the hadoop command you probably want to set HADOOP_HOME in flume-env.sh to your custom install.
Also, the client and server should be the same version. Brock On Mon, Jan 14, 2013 at 4:43 PM, Sagar Mehta <[email protected]> wrote: > ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some > errors about the guava dependencies so put in that jar too] > > smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e > "guava" > -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar > -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50 > hadoop-core-0.20.2-cdh3u5.jar > > Now I don't event see the file being created in hdfs and the flume log is > happily talking about housekeeping for some file channel checkpoints, > updating pointers et al > > Below is tail of flume log > > hadoop@collector102:/data/flume_log$ tail -10 flume.log > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO > org.apache.flume.channel.file.Log - Updated checkpoint for file: > /data/flume_data/channel2/data/log-36 position: 129415524 logWriteOrderID: > 1358209947324 > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO > org.apache.flume.channel.file.LogFile - Closing RandomReader > /data/flume_data/channel2/data/log-34 > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO > org.apache.flume.channel.file.Log - Updated checkpoint for file: > /data/flume_data/channel1/data/log-36 position: 129415524 logWriteOrderID: > 1358209947323 > 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO > org.apache.flume.channel.file.LogFile - Closing RandomReader > /data/flume_data/channel1/data/log-34 > 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO > org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta > currentPosition = 18577138, logWriteOrderID = 1358209947324 > 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO > org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta > currentPosition = 18577138, logWriteOrderID = 1358209947323 > 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO > org.apache.flume.channel.file.LogFile - Closing RandomReader > /data/flume_data/channel1/data/log-35 > 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO > org.apache.flume.channel.file.LogFile - Closing RandomReader > /data/flume_data/channel2/data/log-35 > 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO > org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta > currentPosition = 217919486, logWriteOrderID = 1358209947323 > 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO > org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta > currentPosition = 217919486, logWriteOrderID = 1358209947324 > > Sagar > > > On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[email protected]> wrote: >> >> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old, >> I would upgrade to CDH3u5 or CDH 4.1.2. >> >> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[email protected]> wrote: >> > About the bz2 suggestion, we have a ton of downstream jobs that assume >> > gzip >> > compressed files - so it is better to stick to gzip. >> > >> > The plan B for us is to have a Oozie step to gzip compress the logs >> > before >> > proceeding with downstream Hadoop jobs - but that looks like a hack to >> > me!! >> > >> > Sagar >> > >> > >> > On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <[email protected]> >> > wrote: >> >> >> >> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat >> >> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l >> >> >> >> gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz: >> >> decompression >> >> OK, trailing garbage ignored >> >> 100 >> >> >> >> This should be about 50,000 events for the 5 min window!! >> >> >> >> Sagar >> >> >> >> On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <[email protected]> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> Can you try: zcat file > output >> >>> >> >>> I think what is occurring is because of the flush the output file is >> >>> actually several concatenated gz files. >> >>> >> >>> Brock >> >>> >> >>> On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <[email protected]> >> >>> wrote: >> >>> > Yeah I have tried the text write format in vain before, but >> >>> > nevertheless >> >>> > gave it a try again!! Below is the latest file - still the same >> >>> > thing. >> >>> > >> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date >> >>> > Mon Jan 14 23:02:07 UTC 2013 >> >>> > >> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls >> >>> > >> >>> > >> >>> > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >> >>> > Found 1 items >> >>> > -rw-r--r-- 3 hadoop supergroup 4798117 2013-01-14 22:55 >> >>> > >> >>> > >> >>> > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >> >>> > >> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget >> >>> > >> >>> > >> >>> > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >> >>> > . >> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip >> >>> > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >> >>> > >> >>> > gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz: >> >>> > decompression >> >>> > OK, trailing garbage ignored >> >>> > >> >>> > Interestingly enough, the gzip page says it is a harmless warning - >> >>> > http://www.gzip.org/#faq8 >> >>> > >> >>> > However, I'm losing events on decompression so I cannot afford to >> >>> > ignore >> >>> > this warning. The gzip page gives an example about magnetic tape - >> >>> > there is >> >>> > an analogy of hdfs block here since the file is initially stored in >> >>> > hdfs >> >>> > before I pull it out on the local filesystem. >> >>> > >> >>> > Sagar >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson >> >>> > <[email protected]> >> >>> > wrote: >> >>> >> >> >>> >> collector102.sinks.sink1.hdfs.writeFormat = TEXT >> >>> >> collector102.sinks.sink2.hdfs.writeFormat = TEXT >> >>> > >> >>> > >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Apache MRUnit - Unit testing MapReduce - >> >>> http://incubator.apache.org/mrunit/ >> >> >> >> >> > >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - >> http://incubator.apache.org/mrunit/ > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
