Experiment : A proper bag memory manager.
-----------------------------------------

                 Key: PIG-167
                 URL: https://issues.apache.org/jira/browse/PIG-167
             Project: Pig
          Issue Type: Improvement
            Reporter: Pi Song


According to PIG-164, I think we still have room for improvement:-
1) Alan said
{quote}
"It rests on the assumption that data bags generally live about the same amount 
of time, thus there won't be a long lived databag at the head of the list 
blocking the cleaning of many stale references later in the list."
{quote}

By looking at a line of code in SpillableMemoryManager
{noformat}
Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
{noformat}

- Alan's assumption might be wrong after the memory manager tries to spill the 
list.
- I don't understand why this has to be sorted and start spilling from the 
smallest bags first. Most file systems are not good at handling small files 
(specially ext2/ext3).

2) We use a linkedlist to maintain WeakReference. Normally a linkedlist 
consumes double as much memory that an array would consume(for pointers). 
Should it be better to change LinkedList to Array or ArrayList?

3) In SpillableMemoryManager, handleNotification which does a kind of I/O 
intensive job shares the same lock with registerSpillable. This doesn't seem to 
be efficient.

4) Sometimes I recognized that the bag currently in use got spilled and read 
back over and over again. Essentially, the memory manager should consider 
spilling bags currently not in use first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to