Re: lzo with pig

vijaya bhaskar peddinti Mon, 12 Dec 2011 10:15:35 -0800

Hi,

Sorry i did not get your question. If you mean the numbers map and reduce
jobs that were created, then the details are below:


*For the Plain text based processing*:
Maps were 6 and Reduce 2.

*For compressed based processing*:
Maps were 2 and Reduce 1

I have not checked the exact sum of the times but the *Avg execution times
( of 2 sample runs)* are as follows:
For Plain Text: ~7mins
For Lzo with Protobuf: ~11mins (i/p and o/p are compressed)
For Lzo without Protobuf: ~10mins (i/p and o/p are compressed).

In the Lzo ReadMe.md, i have read that indexer support related code in not
committed back or included for Pig. Is Lzo Indexing supported in PIG?

the following are steps that i have done :
1. Created lzo file using LzoCodec in Java code
2. Created Indexer files using LzoIndexer(in-process).
3. Loading using Lzo*ProtobufLoader  in pig script
4. Storing the data using Lzo*ProtobufStorage methods

thanks and regards,
Vijaya Bhaskar Peddinti


On Mon, Dec 12, 2011 at 10:21 AM, Dmitriy Ryaboy <[email protected]> wrote:

> How many tasks did the uncompressed data require?
> How many tasks did the compressed data require?
>
> If you add up total cluster time for each task for the two jobs, how do
> these sums compare?
>
> D
>
>
> On Sat, Dec 10, 2011 at 11:36 PM, vijaya bhaskar peddinti <
> [email protected]> wrote:
>
> > Hi,
> >
> > the comparison is between simple text files and lzo with protobuf. I am
> > using LzoIndexer for calculating the splits. The intermediate data or the
> > map outputs are not compressed.
> >
> > What i am trying to do is executing a simple select queries using the
> > simple text data and lzo with protobufs in pig scripts and based on the
> > result planning to use them in the project.
> >
> > I have tried with the following options
> > Plain Text files vs Lzo+Protobuf(with and without output compression of
> > final result)
> > Plain Text files vs Lzo of Plain Text here using LzoTokenisedLoader
> >
> > In all the cases the performance of Plain Text files version is better
> than
> > others.
> >
> > Am I missing a point here wrt to usage of Lzo?
> >
> > thanks and regards,
> > Vijaya Bhaskar Peddinti
> >
> > On Sun, Dec 11, 2011 at 12:52 PM, Prashant Kommireddi
> > <[email protected]>wrote:
> >
> > > Vijay it really depends on what you are doing with LZO. Is it being
> > > used for creating splits, map output compression, intermediate files?
> > > Also what are you comparing this to? Simple text files, gzip/bzip
> > > compressed files?
> > >
> > > Sent from my iPhone
> > >
> > > On Dec 10, 2011, at 11:12 PM, vijaya bhaskar peddinti
> > > <[email protected]> wrote:
> > >
> > > > Dear All,
> > > >
> > > > I am doing a PoC on Lzo compression with Protobuf using elephant bird
> > and
> > > > Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also
> done
> > > > indexing for the Lzo file. i have noticed that there is no
> performance
> > > > improvement when compared with uncompressed data. Does Lzo support is
> > > there
> > > > for Pig?
> > > >
> > > > The data size if 1.5GB for the PoC. Pig script is a select query kind
> > of
> > > > which reads and writes data using Lzo*ProtoBuf Loader and storage
> > > methods.
> > > >
> > > > Please provide any suggestions and pointer in this regards.
> > > >
> > > >
> > > > thanks and regards,
> > > > Vijaya Bhaskar Peddinti
> > >
> >
>

Re: lzo with pig

Reply via email to