Re: lzo with pig

Prashant Kommireddi Mon, 12 Dec 2011 10:23:19 -0800

You have 10 nodes and in case of uncompressed files your job is more
parallel across your cluster hence its faster. When you compress it
the # of splits gets reduced and only 2 map tasks are handling more
data.


As I mentioned 2.5 GB might be too little to benchmark this.

Sent from my iPhone

On Dec 12, 2011, at 10:15 AM, vijaya bhaskar peddinti
<[email protected]> wrote:

> Hi,
>
> Sorry i did not get your question. If you mean the numbers map and reduce
> jobs that were created, then the details are below:
>
> *For the Plain text based processing*:
> Maps were 6 and Reduce 2.
>
> *For compressed based processing*:
> Maps were 2 and Reduce 1
>
> I have not checked the exact sum of the times but the *Avg execution times
> ( of 2 sample runs)* are as follows:
> For Plain Text: ~7mins
> For Lzo with Protobuf: ~11mins (i/p and o/p are compressed)
> For Lzo without Protobuf: ~10mins (i/p and o/p are compressed).
>
> In the Lzo ReadMe.md, i have read that indexer support related code in not
> committed back or included for Pig. Is Lzo Indexing supported in PIG?
>
> the following are steps that i have done :
> 1. Created lzo file using LzoCodec in Java code
> 2. Created Indexer files using LzoIndexer(in-process).
> 3. Loading using Lzo*ProtobufLoader  in pig script
> 4. Storing the data using Lzo*ProtobufStorage methods
>
> thanks and regards,
> Vijaya Bhaskar Peddinti
>
>
> On Mon, Dec 12, 2011 at 10:21 AM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> How many tasks did the uncompressed data require?
>> How many tasks did the compressed data require?
>>
>> If you add up total cluster time for each task for the two jobs, how do
>> these sums compare?
>>
>> D
>>
>>
>> On Sat, Dec 10, 2011 at 11:36 PM, vijaya bhaskar peddinti <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> the comparison is between simple text files and lzo with protobuf. I am
>>> using LzoIndexer for calculating the splits. The intermediate data or the
>>> map outputs are not compressed.
>>>
>>> What i am trying to do is executing a simple select queries using the
>>> simple text data and lzo with protobufs in pig scripts and based on the
>>> result planning to use them in the project.
>>>
>>> I have tried with the following options
>>> Plain Text files vs Lzo+Protobuf(with and without output compression of
>>> final result)
>>> Plain Text files vs Lzo of Plain Text here using LzoTokenisedLoader
>>>
>>> In all the cases the performance of Plain Text files version is better
>> than
>>> others.
>>>
>>> Am I missing a point here wrt to usage of Lzo?
>>>
>>> thanks and regards,
>>> Vijaya Bhaskar Peddinti
>>>
>>> On Sun, Dec 11, 2011 at 12:52 PM, Prashant Kommireddi
>>> <[email protected]>wrote:
>>>
>>>> Vijay it really depends on what you are doing with LZO. Is it being
>>>> used for creating splits, map output compression, intermediate files?
>>>> Also what are you comparing this to? Simple text files, gzip/bzip
>>>> compressed files?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Dec 10, 2011, at 11:12 PM, vijaya bhaskar peddinti
>>>> <[email protected]> wrote:
>>>>
>>>>> Dear All,
>>>>>
>>>>> I am doing a PoC on Lzo compression with Protobuf using elephant bird
>>> and
>>>>> Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also
>> done
>>>>> indexing for the Lzo file. i have noticed that there is no
>> performance
>>>>> improvement when compared with uncompressed data. Does Lzo support is
>>>> there
>>>>> for Pig?
>>>>>
>>>>> The data size if 1.5GB for the PoC. Pig script is a select query kind
>>> of
>>>>> which reads and writes data using Lzo*ProtoBuf Loader and storage
>>>> methods.
>>>>>
>>>>> Please provide any suggestions and pointer in this regards.
>>>>>
>>>>>
>>>>> thanks and regards,
>>>>> Vijaya Bhaskar Peddinti
>>>>
>>>
>>

Re: lzo with pig

Reply via email to