PERFORMANCE: Use lightweight bag implementations which do not register with 
SpillableMemoryManager with Combiner
----------------------------------------------------------------------------------------------------------------

                 Key: PIG-636
                 URL: https://issues.apache.org/jira/browse/PIG-636
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch


Currently whenever Combiner is used in pig, in the map, the 
POPrecombinerLocalRearrange operator puts the single "value" tuple 
corresponding to a key into a DataBag and passes this to the foreach which is 
being combined. This will generate as many bags as there are input records. 
These bags all will have a single tuple and hence are small and should not need 
to be spilt to disk. However since the bags are created through the BagFactory 
mechanism, each bag creation is registered with the SpillableMemoryManager and 
a weak reference to the bag is stored in a linked list. This linked list grows 
really big over time causing unnecessary Garbage collection runs. This can be 
avoided by having a simple lightweight implementation of the DataBag interface 
to store the single tuple in a bag. Also these SingleTupleBags should be 
created without registering with the spillableMemoryManager. Likewise the bags 
created in POCombinePackage are supposed to fit in Memory and not spill. Again 
a NonSpillableDataBag implementation of DataBag interface which does not 
register with the SpillableMemoryManager would help.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to