[ 
https://issues.apache.org/jira/browse/PIG-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-628:
-------------------------------

    Attachment: PIG-628.patch

Attached patch which implements the changes described in the issue description.

> PERFORMANCE: Misc. optimizations including optimization in Tuple 
> serialization, set up of PigMapReduce & PigCombiner, accessing index in 
> POLocalRearrange
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-628
>                 URL: https://issues.apache.org/jira/browse/PIG-628
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-628.patch
>
>
> - Currently DefaultTuple.write() needlessly writes a marker for null/not 
> null. This is already handled by PigNullableWritable for keys and 
> NullableTuple for values. Nested null tuples inside a tuple are written out 
> as nulls in DataReaderWriter.writeDatum. So the null/not null marker in 
> DefaultTuple can be avoided.
> - In PigMapReduce and PigCombiner the roots and leaves of the plans are 
> calculated in each reduce() call. Instead these can be computed in 
> configure() one time.
> - In each call of POLocalRearrange.getNext(), a new lroutput tuple is created 
> whose first position is filled with index, second with key and third with 
> value - this can be optimized by having a tuple member in POLocalRearrange 
> which is reused in each getNext() call. Further, the first position of this 
> tuple can be pre-filled with the index in the setIndex() method of 
> POLocalRearrange at script compile time.
> - In POCombinerPackage, the metadata data structures to figure out which 
> parts of the value are present in the key can be set up in the setKeyInfo() 
> method at compile time. This is because we currently use POCombinerPackage 
> only with a "group by". Hence we don't need to look up the metadata at run 
> time based on input index since there will be only one input (index = 0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to