[ 
https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770206#action_12770206
 ] 

Alan Gates commented on PIG-1037:
---------------------------------

Comments:

In InternalSortedBag.add, you are calculating the average size every time you 
add a tuple for the first 100 tuples.  Rather than do the calculations every 
time, wouldn't it be better wait until you get to 100 tuples then calculate the 
average?  This would miss the case where you can store less than 100 tuples, 
but that seems unlikely.

Some of the comments in InternalSortedBag that were copied over from the 
previous code, such as dealing with spills in the midst of reading, are no 
longer true.  They should be removed since they will cause confusion on how the 
code works.

I think the synchronized blocks in InternalSortedBag can be removed.  They were 
there before because spills could be triggered by a separate thread.  Since 
that is no longer true we should be able to remove these.  This will remove a 
lock/unlock on every read of a record out of the bag and should provide some 
speed up.



> better memory layout and spill for sorted and distinct bags
> -----------------------------------------------------------
>
>                 Key: PIG-1037
>                 URL: https://issues.apache.org/jira/browse/PIG-1037
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Ying He
>         Attachments: PIG-1037.patch, PIG-1037.patch2
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to