[ 
https://issues.apache.org/jira/browse/PIG-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581689#action_12581689
 ] 

Benjamin Reed commented on PIG-167:
-----------------------------------

I think you've identified a MAJOR bug. The  SpillManager should spill the 
biggest bags first. Did you try running your tests again with that change?

I think you want the lock contention. In the low memory condition you don't 
want allocations to continue because you might run out of memory. Waiting for a 
lock is much better than getting an out of memory error.

I'm also wondering about trying to spill eden first. My intuition would that 
recently created bags are more likely to be used than old bags, but I have no 
measurements to show that :)

Alan's assumption is correct between spilling. By cleaning from the head Alan 
can do some between spill housekeeping. (The memory manager cleans up during a 
spill.)

Our efforts in this area are so that jobs complete successfully not so much 
that they perform better. (Both would be great, but slow success is much better 
than quick failure.)

> Experiment : A proper bag memory manager.
> -----------------------------------------
>
>                 Key: PIG-167
>                 URL: https://issues.apache.org/jira/browse/PIG-167
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>         Attachments: diagram.gif, MemManager0.patch
>
>
> According to PIG-164, I think we still have room for improvement:-
> 1) Alan said
> {quote}
> "It rests on the assumption that data bags generally live about the same 
> amount of time, thus there won't be a long lived databag at the head of the 
> list blocking the cleaning of many stale references later in the list."
> {quote}
> By looking at a line of code in SpillableMemoryManager
> {noformat}
> Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
> {noformat}
> - Alan's assumption might be wrong after the memory manager tries to spill 
> the list.
> - I don't understand why this has to be sorted and start spilling from the 
> smallest bags first. Most file systems are not good at handling small files 
> (specially ext2/ext3).
> 2) We use a linkedlist to maintain WeakReference. Normally a linkedlist 
> consumes double as much memory that an array would consume(for pointers). 
> Should it be better to change LinkedList to Array or ArrayList?
> 3) In SpillableMemoryManager, handleNotification which does a kind of I/O 
> intensive job shares the same lock with registerSpillable. This doesn't seem 
> to be efficient.
> 4) Sometimes I recognized that the bag currently in use got spilled and read 
> back over and over again. Essentially, the memory manager should consider 
> spilling bags currently not in use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to