Hi, the comparison is between simple text files and lzo with protobuf. I am using LzoIndexer for calculating the splits. The intermediate data or the map outputs are not compressed.
What i am trying to do is executing a simple select queries using the simple text data and lzo with protobufs in pig scripts and based on the result planning to use them in the project. I have tried with the following options Plain Text files vs Lzo+Protobuf(with and without output compression of final result) Plain Text files vs Lzo of Plain Text here using LzoTokenisedLoader In all the cases the performance of Plain Text files version is better than others. Am I missing a point here wrt to usage of Lzo? thanks and regards, Vijaya Bhaskar Peddinti On Sun, Dec 11, 2011 at 12:52 PM, Prashant Kommireddi <[email protected]>wrote: > Vijay it really depends on what you are doing with LZO. Is it being > used for creating splits, map output compression, intermediate files? > Also what are you comparing this to? Simple text files, gzip/bzip > compressed files? > > Sent from my iPhone > > On Dec 10, 2011, at 11:12 PM, vijaya bhaskar peddinti > <[email protected]> wrote: > > > Dear All, > > > > I am doing a PoC on Lzo compression with Protobuf using elephant bird and > > Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also done > > indexing for the Lzo file. i have noticed that there is no performance > > improvement when compared with uncompressed data. Does Lzo support is > there > > for Pig? > > > > The data size if 1.5GB for the PoC. Pig script is a select query kind of > > which reads and writes data using Lzo*ProtoBuf Loader and storage > methods. > > > > Please provide any suggestions and pointer in this regards. > > > > > > thanks and regards, > > Vijaya Bhaskar Peddinti >
