Scribe can also write lzo-compressed output. The indexing step still needs to be taken (Gerrit, does your bigstreams write out indexes automatically?).
So our workflow is more like: 1) Scribe to hdfs with lzo compression 2) index 3) run pig queries over data with EB loaders. On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren < [email protected]> wrote: > Hi, > > Have a look at http://code.google.com/p/bigstreams/ and > http://code.google.com/p/hadoop-gpl-packing/. > If you configure bigstreams to use lzo, it will collect your log files from > servers and write it out plus load it to hadoop in lzo format. > > Cheers, > Gerrit > > On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <[email protected] > >wrote: > > > Hi, > > > > I recently for Pig to work with Lzo compression, with pig loaders from > > Elephant Bird. > > > > But, from my understanding my work flow is turning out to be: > > Step 1 : lzo-compress the raw input file. > > Step 2 : put the compressed.lzo file to hdfs. > > Step 3 : execute pig jobs with loaders from elephant-bird. > > > > Now, this looks to be an all manual workflow; needs a lot baby sitting. > > > > Please correct me if i'm wrong, but what I am wondering about is, if EB > or > > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual > > intervention? > > > > > > Thanks, > > Chaitanya > > >
