hadoop@jobtracker301:/home/hadoop/sagar/debug$ zcat collector102.ngpipes.sac.ngmoco.com.1358204406896.gz | wc -l
gzip: collector102.ngpipes.sac.ngmoco.com.1358204406896.gz: decompression OK, trailing garbage ignored 100 This should be about 50,000 events for the 5 min window!! Sagar On Mon, Jan 14, 2013 at 3:16 PM, Brock Noland <[email protected]> wrote: > Hi, > > Can you try: zcat file > output > > I think what is occurring is because of the flush the output file is > actually several concatenated gz files. > > Brock > > On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta <[email protected]> wrote: > > Yeah I have tried the text write format in vain before, but nevertheless > > gave it a try again!! Below is the latest file - still the same thing. > > > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date > > Mon Jan 14 23:02:07 UTC 2013 > > > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls > > > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > > Found 1 items > > -rw-r--r-- 3 hadoop supergroup 4798117 2013-01-14 22:55 > > > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > > > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget > > > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > > . > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip > > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > > > > gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz: decompression > > OK, trailing garbage ignored > > > > Interestingly enough, the gzip page says it is a harmless warning - > > http://www.gzip.org/#faq8 > > > > However, I'm losing events on decompression so I cannot afford to ignore > > this warning. The gzip page gives an example about magnetic tape - there > is > > an analogy of hdfs block here since the file is initially stored in hdfs > > before I pull it out on the local filesystem. > > > > Sagar > > > > > > > > > > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson <[email protected]> > > wrote: > >> > >> collector102.sinks.sink1.hdfs.writeFormat = TEXT > >> collector102.sinks.sink2.hdfs.writeFormat = TEXT > > > > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ >
