Re: Pig Lzo Workflow.

Dmitriy Ryaboy Tue, 19 Apr 2011 14:05:34 -0700

Scribe can also write lzo-compressed output.

The indexing step still needs to be taken (Gerrit, does your bigstreams
write out indexes automatically?).


So our workflow is more like:

1) Scribe to hdfs with lzo compression
2) index
3) run pig queries over data with EB loaders.

On Tue, Apr 19, 2011 at 12:48 PM, Gerrit Jansen van Vuuren <
[email protected]> wrote:

> Hi,
>
> Have a look at http://code.google.com/p/bigstreams/ and
> http://code.google.com/p/hadoop-gpl-packing/.
> If you configure bigstreams to use lzo, it will collect your log files from
> servers and write it out plus load it to hadoop in lzo format.
>
> Cheers,
>  Gerrit
>
> On Tue, Apr 19, 2011 at 9:44 PM, Chaitanya Sharma <[email protected]
> >wrote:
>
> > Hi,
> >
> > I recently for Pig to work with Lzo compression, with pig loaders from
> > Elephant Bird.
> >
> > But, from my understanding my work flow is turning out to be:
> > Step 1 :  lzo-compress the raw input file.
> > Step 2 :  put the compressed.lzo file to hdfs.
> > Step 3 :  execute pig jobs with loaders from elephant-bird.
> >
> > Now, this looks to be an all manual workflow; needs a lot baby sitting.
> >
> > Please correct me if i'm wrong, but what I am wondering about is, if EB
> or
> > Hadoop-Lzo could automate Step #1, Step #2 and would not need manual
> > intervention?
> >
> >
> > Thanks,
> > Chaitanya
> >
>

Re: Pig Lzo Workflow.

Reply via email to