[ https://issues.apache.org/jira/browse/PIG-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-628: ------------------------------- Attachment: PIG-628.patch Attached patch which implements the changes described in the issue description. > PERFORMANCE: Misc. optimizations including optimization in Tuple > serialization, set up of PigMapReduce & PigCombiner, accessing index in > POLocalRearrange > --------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-628 > URL: https://issues.apache.org/jira/browse/PIG-628 > Project: Pig > Issue Type: Improvement > Affects Versions: types_branch > Reporter: Pradeep Kamath > Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-628.patch > > > - Currently DefaultTuple.write() needlessly writes a marker for null/not > null. This is already handled by PigNullableWritable for keys and > NullableTuple for values. Nested null tuples inside a tuple are written out > as nulls in DataReaderWriter.writeDatum. So the null/not null marker in > DefaultTuple can be avoided. > - In PigMapReduce and PigCombiner the roots and leaves of the plans are > calculated in each reduce() call. Instead these can be computed in > configure() one time. > - In each call of POLocalRearrange.getNext(), a new lroutput tuple is created > whose first position is filled with index, second with key and third with > value - this can be optimized by having a tuple member in POLocalRearrange > which is reused in each getNext() call. Further, the first position of this > tuple can be pre-filled with the index in the setIndex() method of > POLocalRearrange at script compile time. > - In POCombinerPackage, the metadata data structures to figure out which > parts of the value are present in the key can be set up in the setKeyInfo() > method at compile time. This is because we currently use POCombinerPackage > only with a "group by". Hence we don't need to look up the metadata at run > time based on input index since there will be only one input (index = 0) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.