You have 10 nodes and in case of uncompressed files your job is more parallel across your cluster hence its faster. When you compress it the # of splits gets reduced and only 2 map tasks are handling more data.
As I mentioned 2.5 GB might be too little to benchmark this. Sent from my iPhone On Dec 12, 2011, at 10:15 AM, vijaya bhaskar peddinti <[email protected]> wrote: > Hi, > > Sorry i did not get your question. If you mean the numbers map and reduce > jobs that were created, then the details are below: > > *For the Plain text based processing*: > Maps were 6 and Reduce 2. > > *For compressed based processing*: > Maps were 2 and Reduce 1 > > I have not checked the exact sum of the times but the *Avg execution times > ( of 2 sample runs)* are as follows: > For Plain Text: ~7mins > For Lzo with Protobuf: ~11mins (i/p and o/p are compressed) > For Lzo without Protobuf: ~10mins (i/p and o/p are compressed). > > In the Lzo ReadMe.md, i have read that indexer support related code in not > committed back or included for Pig. Is Lzo Indexing supported in PIG? > > the following are steps that i have done : > 1. Created lzo file using LzoCodec in Java code > 2. Created Indexer files using LzoIndexer(in-process). > 3. Loading using Lzo*ProtobufLoader in pig script > 4. Storing the data using Lzo*ProtobufStorage methods > > thanks and regards, > Vijaya Bhaskar Peddinti > > > On Mon, Dec 12, 2011 at 10:21 AM, Dmitriy Ryaboy <[email protected]> wrote: > >> How many tasks did the uncompressed data require? >> How many tasks did the compressed data require? >> >> If you add up total cluster time for each task for the two jobs, how do >> these sums compare? >> >> D >> >> >> On Sat, Dec 10, 2011 at 11:36 PM, vijaya bhaskar peddinti < >> [email protected]> wrote: >> >>> Hi, >>> >>> the comparison is between simple text files and lzo with protobuf. I am >>> using LzoIndexer for calculating the splits. The intermediate data or the >>> map outputs are not compressed. >>> >>> What i am trying to do is executing a simple select queries using the >>> simple text data and lzo with protobufs in pig scripts and based on the >>> result planning to use them in the project. >>> >>> I have tried with the following options >>> Plain Text files vs Lzo+Protobuf(with and without output compression of >>> final result) >>> Plain Text files vs Lzo of Plain Text here using LzoTokenisedLoader >>> >>> In all the cases the performance of Plain Text files version is better >> than >>> others. >>> >>> Am I missing a point here wrt to usage of Lzo? >>> >>> thanks and regards, >>> Vijaya Bhaskar Peddinti >>> >>> On Sun, Dec 11, 2011 at 12:52 PM, Prashant Kommireddi >>> <[email protected]>wrote: >>> >>>> Vijay it really depends on what you are doing with LZO. Is it being >>>> used for creating splits, map output compression, intermediate files? >>>> Also what are you comparing this to? Simple text files, gzip/bzip >>>> compressed files? >>>> >>>> Sent from my iPhone >>>> >>>> On Dec 10, 2011, at 11:12 PM, vijaya bhaskar peddinti >>>> <[email protected]> wrote: >>>> >>>>> Dear All, >>>>> >>>>> I am doing a PoC on Lzo compression with Protobuf using elephant bird >>> and >>>>> Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also >> done >>>>> indexing for the Lzo file. i have noticed that there is no >> performance >>>>> improvement when compared with uncompressed data. Does Lzo support is >>>> there >>>>> for Pig? >>>>> >>>>> The data size if 1.5GB for the PoC. Pig script is a select query kind >>> of >>>>> which reads and writes data using Lzo*ProtoBuf Loader and storage >>>> methods. >>>>> >>>>> Please provide any suggestions and pointer in this regards. >>>>> >>>>> >>>>> thanks and regards, >>>>> Vijaya Bhaskar Peddinti >>>> >>> >>
