Is there any incompatibility in trying to write to a different version of Hadoop then?
- Connor On Mon, Jan 14, 2013 at 5:25 PM, Bhaskar V. Karambelkar <[email protected] > wrote: > Sagar, > You're better of downloading and unzipping CDH3u5 or CDH4 some where, and > pointing the HADOOP_HOME env. variable to the base directory. > That way you won't have to worry about which jar files are needed and > which not. > Flume will auto add all JARs from the Hadoop Installation that it needs. > > regards > Bhaskar > > > On Mon, Jan 14, 2013 at 7:43 PM, Sagar Mehta <[email protected]> wrote: > >> ok so I dropped in the new hadoop-core jar in /opt/flume/lib [I got some >> errors about the guava dependencies so put in that jar too] >> >> smehta@collector102:/opt/flume/lib$ ls -ltrh | grep -e "hadoop-core" -e >> "guava" >> -rw-r--r-- 1 hadoop hadoop 1.5M 2012-11-14 21:49 guava-10.0.1.jar >> -rw-r--r-- 1 hadoop hadoop 3.7M 2013-01-14 23:50 >> hadoop-core-0.20.2-cdh3u5.jar >> >> Now I don't event see the file being created in hdfs and the flume log is >> happily talking about housekeeping for some file channel checkpoints, >> updating pointers et al >> >> Below is tail of flume log >> >> *hadoop@collector102:/data/flume_log$ tail -10 flume.log* >> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO >> org.apache.flume.channel.file.Log - Updated checkpoint for file: >> /data/flume_data/channel2/data/log-36 position: 129415524 logWriteOrderID: >> 1358209947324 >> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel2] INFO >> org.apache.flume.channel.file.LogFile - Closing RandomReader >> /data/flume_data/channel2/data/log-34 >> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO >> org.apache.flume.channel.file.Log - Updated checkpoint for file: >> /data/flume_data/channel1/data/log-36 position: 129415524 logWriteOrderID: >> 1358209947323 >> 2013-01-15 00:42:10,814 [Log-BackgroundWorker-channel1] INFO >> org.apache.flume.channel.file.LogFile - Closing RandomReader >> /data/flume_data/channel1/data/log-34 >> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel2] INFO >> org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta >> currentPosition = 18577138, logWriteOrderID = 1358209947324 >> 2013-01-15 00:42:10,819 [Log-BackgroundWorker-channel1] INFO >> org.apache.flume.channel.file.LogFileV3 - Updating log-34.meta >> currentPosition = 18577138, logWriteOrderID = 1358209947323 >> 2013-01-15 00:42:10,820 [Log-BackgroundWorker-channel1] INFO >> org.apache.flume.channel.file.LogFile - Closing RandomReader >> /data/flume_data/channel1/data/log-35 >> 2013-01-15 00:42:10,821 [Log-BackgroundWorker-channel2] INFO >> org.apache.flume.channel.file.LogFile - Closing RandomReader >> /data/flume_data/channel2/data/log-35 >> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel1] INFO >> org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta >> currentPosition = 217919486, logWriteOrderID = 1358209947323 >> 2013-01-15 00:42:10,826 [Log-BackgroundWorker-channel2] INFO >> org.apache.flume.channel.file.LogFileV3 - Updating log-35.meta >> currentPosition = 217919486, logWriteOrderID = 1358209947324 >> >> Sagar >> >> >> On Mon, Jan 14, 2013 at 3:38 PM, Brock Noland <[email protected]> wrote: >> >>> Hmm, could you try and updated version of Hadoop? CDH3u2 is quite old, >>> I would upgrade to CDH3u5 or CDH 4.1.2. >>> >>> On Mon, Jan 14, 2013 at 3:27 PM, Sagar Mehta <[email protected]> >>> wrote: >>> > About the bz2 suggestion, we have a ton of downstream jobs that assume >>> gzip >>> > compressed files - so it is better to stick to gzip. >>> > >>> > The plan B for us is to have a Oozie step to gzip compress the logs >>> before >>> > proceeding with downstream Hadoop jobs - but that looks like a hack to >>> me!! >>> > >>> > Sagar >>> > >>> > >>> > On Mon, Jan 14, 2013 at 3:24 PM, Sagar Mehta <[email protected]> >>> wrote: >>> >> >>> >> hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat >>> >> collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l >>> >> >>> >> gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz: >>> decompression >>> >> OK, trailing garbage ignored >>> >> 100 >>> >> >>> >> This should be about 50,000 events for the 5 min window!! >>> >> >>> >> Sagar >>> >> >>> >> On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <[email protected]> >>> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>> Can you try: zcat file > output >>> >>> >>> >>> I think what is occurring is because of the flush the output file is >>> >>> actually several concatenated gz files. >>> >>> >>> >>> Brock >>> >>> >>> >>> On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <[email protected]> >>> >>> wrote: >>> >>> > Yeah I have tried the text write format in vain before, but >>> >>> > nevertheless >>> >>> > gave it a try again!! Below is the latest file - still the same >>> thing. >>> >>> > >>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date >>> >>> > Mon Jan 14 23:02:07 UTC 2013 >>> >>> > >>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls >>> >>> > >>> >>> > >>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >>> >>> > Found 1 items >>> >>> > -rw-r--r-- 3 hadoop supergroup 4798117 2013-01-14 22:55 >>> >>> > >>> >>> > >>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >>> >>> > >>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget >>> >>> > >>> >>> > >>> /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >>> >>> > . >>> >>> > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip >>> >>> > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz >>> >>> > >>> >>> > gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz: >>> >>> > decompression >>> >>> > OK, trailing garbage ignored >>> >>> > >>> >>> > Interestingly enough, the gzip page says it is a harmless warning - >>> >>> > http://www.gzip.org/#faq8 >>> >>> > >>> >>> > However, I'm losing events on decompression so I cannot afford to >>> >>> > ignore >>> >>> > this warning. The gzip page gives an example about magnetic tape - >>> >>> > there is >>> >>> > an analogy of hdfs block here since the file is initially stored in >>> >>> > hdfs >>> >>> > before I pull it out on the local filesystem. >>> >>> > >>> >>> > Sagar >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson >>> >>> > <[email protected]> >>> >>> > wrote: >>> >>> >> >>> >>> >> collector102.sinks.sink1.hdfs.writeFormat = TEXT >>> >>> >> collector102.sinks.sink2.hdfs.writeFormat = TEXT >>> >>> > >>> >>> > >>> >>> > >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Apache MRUnit - Unit testing MapReduce - >>> >>> http://incubator.apache.org/mrunit/ >>> >> >>> >> >>> > >>> >>> >>> >>> -- >>> Apache MRUnit - Unit testing MapReduce - >>> http://incubator.apache.org/mrunit/ >>> >> >> >
