Since uncompressed text required 6 mappers and used 7 minutes each, you used a 
total of 42 minutes of compute time. At 11 minutes for 2 mappers with Lzo, you 
use only 22 minutes, so the whole thing used 2x less resources. If you decrease 
block size when running with Lzo by 3x (so as to cause 6 mappers instead of 2) 
you will see wall-clock time drop, as well. 

Indexing can now be done inline with creating the Lzo files if you use the 
latest Lzo Hadoop and elephantbird libs from github. There are a few params you 
need to set, I think we added them to the docs. 

On Dec 12, 2011, at 10:15 AM, vijaya bhaskar peddinti 
<[email protected]> wrote:

> Hi,
> 
> Sorry i did not get your question. If you mean the numbers map and reduce
> jobs that were created, then the details are below:
> 
> *For the Plain text based processing*:
> Maps were 6 and Reduce 2.
> 
> *For compressed based processing*:
> Maps were 2 and Reduce 1
> 
> I have not checked the exact sum of the times but the *Avg execution times
> ( of 2 sample runs)* are as follows:
> For Plain Text: ~7mins
> For Lzo with Protobuf: ~11mins (i/p and o/p are compressed)
> For Lzo without Protobuf: ~10mins (i/p and o/p are compressed).
> 
> In the Lzo ReadMe.md, i have read that indexer support related code in not
> committed back or included for Pig. Is Lzo Indexing supported in PIG?
> 
> the following are steps that i have done :
> 1. Created lzo file using LzoCodec in Java code
> 2. Created Indexer files using LzoIndexer(in-process).
> 3. Loading using Lzo*ProtobufLoader  in pig script
> 4. Storing the data using Lzo*ProtobufStorage methods
> 
> thanks and regards,
> Vijaya Bhaskar Peddinti
> 
> 
> On Mon, Dec 12, 2011 at 10:21 AM, Dmitriy Ryaboy <[email protected]> wrote:
> 
>> How many tasks did the uncompressed data require?
>> How many tasks did the compressed data require?
>> 
>> If you add up total cluster time for each task for the two jobs, how do
>> these sums compare?
>> 
>> D
>> 
>> 
>> On Sat, Dec 10, 2011 at 11:36 PM, vijaya bhaskar peddinti <
>> [email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> the comparison is between simple text files and lzo with protobuf. I am
>>> using LzoIndexer for calculating the splits. The intermediate data or the
>>> map outputs are not compressed.
>>> 
>>> What i am trying to do is executing a simple select queries using the
>>> simple text data and lzo with protobufs in pig scripts and based on the
>>> result planning to use them in the project.
>>> 
>>> I have tried with the following options
>>> Plain Text files vs Lzo+Protobuf(with and without output compression of
>>> final result)
>>> Plain Text files vs Lzo of Plain Text here using LzoTokenisedLoader
>>> 
>>> In all the cases the performance of Plain Text files version is better
>> than
>>> others.
>>> 
>>> Am I missing a point here wrt to usage of Lzo?
>>> 
>>> thanks and regards,
>>> Vijaya Bhaskar Peddinti
>>> 
>>> On Sun, Dec 11, 2011 at 12:52 PM, Prashant Kommireddi
>>> <[email protected]>wrote:
>>> 
>>>> Vijay it really depends on what you are doing with LZO. Is it being
>>>> used for creating splits, map output compression, intermediate files?
>>>> Also what are you comparing this to? Simple text files, gzip/bzip
>>>> compressed files?
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Dec 10, 2011, at 11:12 PM, vijaya bhaskar peddinti
>>>> <[email protected]> wrote:
>>>> 
>>>>> Dear All,
>>>>> 
>>>>> I am doing a PoC on Lzo compression with Protobuf using elephant bird
>>> and
>>>>> Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also
>> done
>>>>> indexing for the Lzo file. i have noticed that there is no
>> performance
>>>>> improvement when compared with uncompressed data. Does Lzo support is
>>>> there
>>>>> for Pig?
>>>>> 
>>>>> The data size if 1.5GB for the PoC. Pig script is a select query kind
>>> of
>>>>> which reads and writes data using Lzo*ProtoBuf Loader and storage
>>>> methods.
>>>>> 
>>>>> Please provide any suggestions and pointer in this regards.
>>>>> 
>>>>> 
>>>>> thanks and regards,
>>>>> Vijaya Bhaskar Peddinti
>>>> 
>>> 
>> 

Reply via email to