Hi, Sorry i did not get your question. If you mean the numbers map and reduce jobs that were created, then the details are below:
*For the Plain text based processing*: Maps were 6 and Reduce 2. *For compressed based processing*: Maps were 2 and Reduce 1 I have not checked the exact sum of the times but the *Avg execution times ( of 2 sample runs)* are as follows: For Plain Text: ~7mins For Lzo with Protobuf: ~11mins (i/p and o/p are compressed) For Lzo without Protobuf: ~10mins (i/p and o/p are compressed). In the Lzo ReadMe.md, i have read that indexer support related code in not committed back or included for Pig. Is Lzo Indexing supported in PIG? the following are steps that i have done : 1. Created lzo file using LzoCodec in Java code 2. Created Indexer files using LzoIndexer(in-process). 3. Loading using Lzo*ProtobufLoader in pig script 4. Storing the data using Lzo*ProtobufStorage methods thanks and regards, Vijaya Bhaskar Peddinti On Mon, Dec 12, 2011 at 10:21 AM, Dmitriy Ryaboy <[email protected]> wrote: > How many tasks did the uncompressed data require? > How many tasks did the compressed data require? > > If you add up total cluster time for each task for the two jobs, how do > these sums compare? > > D > > > On Sat, Dec 10, 2011 at 11:36 PM, vijaya bhaskar peddinti < > [email protected]> wrote: > > > Hi, > > > > the comparison is between simple text files and lzo with protobuf. I am > > using LzoIndexer for calculating the splits. The intermediate data or the > > map outputs are not compressed. > > > > What i am trying to do is executing a simple select queries using the > > simple text data and lzo with protobufs in pig scripts and based on the > > result planning to use them in the project. > > > > I have tried with the following options > > Plain Text files vs Lzo+Protobuf(with and without output compression of > > final result) > > Plain Text files vs Lzo of Plain Text here using LzoTokenisedLoader > > > > In all the cases the performance of Plain Text files version is better > than > > others. > > > > Am I missing a point here wrt to usage of Lzo? > > > > thanks and regards, > > Vijaya Bhaskar Peddinti > > > > On Sun, Dec 11, 2011 at 12:52 PM, Prashant Kommireddi > > <[email protected]>wrote: > > > > > Vijay it really depends on what you are doing with LZO. Is it being > > > used for creating splits, map output compression, intermediate files? > > > Also what are you comparing this to? Simple text files, gzip/bzip > > > compressed files? > > > > > > Sent from my iPhone > > > > > > On Dec 10, 2011, at 11:12 PM, vijaya bhaskar peddinti > > > <[email protected]> wrote: > > > > > > > Dear All, > > > > > > > > I am doing a PoC on Lzo compression with Protobuf using elephant bird > > and > > > > Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also > done > > > > indexing for the Lzo file. i have noticed that there is no > performance > > > > improvement when compared with uncompressed data. Does Lzo support is > > > there > > > > for Pig? > > > > > > > > The data size if 1.5GB for the PoC. Pig script is a select query kind > > of > > > > which reads and writes data using Lzo*ProtoBuf Loader and storage > > > methods. > > > > > > > > Please provide any suggestions and pointer in this regards. > > > > > > > > > > > > thanks and regards, > > > > Vijaya Bhaskar Peddinti > > > > > >
