[ 
https://issues.apache.org/jira/browse/PIG-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581954#action_12581954
 ] 

Pi Song commented on PIG-167:
-----------------------------

I would like to introduce a simpler (but more efficient) design.

Here I've got a LinkedList of ArrayLists holding WeakReferences:-
{noformat}
Reclaim memory here first
    V
[ArrayList] - > [ArrayList] - >[ArrayList] - > [ArrayList]    <=== Register new 
spillables here
{noformat}

- N is the number of spillables per ArrayList(Node)
- The LinkedList grows at the tail.
- Reclaiming memory is done at the head first and does clean the whole node. 
Spillables that are not null yet are migrated to the LinkedList tail. Then the 
whole ArrayList(Node) will be thrown away.
- Reclaiming keeps cleaning next node if all refs in the current node are null.
- Reclaiming can be activated in two ways 1)By MXBean 2)When register counter 
hits the threshold (We maintain this counter. It is reset once we reclaim).

Pros:-
- Guarantee old bags are clean-up first.
- Reduce memory usage by half for maintaining references compared to the 
existing one
- Less overhead in maintaining reference list (No clean-up every register. No 
non-linear operation (sort). Always one pass over the list)
- From initial tests, it is slightly faster than Alan's fix (Haven't tried 
Ben's new fix)

Important Facts:-
- ArrayList is as good as LinkedList in .add() in Java (after taking allocating 
new array into consideration)

> Experiment : A proper bag memory manager.
> -----------------------------------------
>
>                 Key: PIG-167
>                 URL: https://issues.apache.org/jira/browse/PIG-167
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>         Attachments: MemManager0.patch
>
>
> According to PIG-164, I think we still have room for improvement:-
> 1) Alan said
> {quote}
> "It rests on the assumption that data bags generally live about the same 
> amount of time, thus there won't be a long lived databag at the head of the 
> list blocking the cleaning of many stale references later in the list."
> {quote}
> By looking at a line of code in SpillableMemoryManager
> {noformat}
> Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
> {noformat}
> - Alan's assumption might be wrong after the memory manager tries to spill 
> the list.
> - I don't understand why this has to be sorted and start spilling from the 
> smallest bags first. Most file systems are not good at handling small files 
> (specially ext2/ext3).
> 2) We use a linkedlist to maintain WeakReference. Normally a linkedlist 
> consumes double as much memory that an array would consume(for pointers). 
> Should it be better to change LinkedList to Array or ArrayList?
> 3) In SpillableMemoryManager, handleNotification which does a kind of I/O 
> intensive job shares the same lock with registerSpillable. This doesn't seem 
> to be efficient.
> 4) Sometimes I recognized that the bag currently in use got spilled and read 
> back over and over again. Essentially, the memory manager should consider 
> spilling bags currently not in use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to