Experiment : A proper bag memory manager.
-----------------------------------------
Key: PIG-167
URL: https://issues.apache.org/jira/browse/PIG-167
Project: Pig
Issue Type: Improvement
Reporter: Pi Song
According to PIG-164, I think we still have room for improvement:-
1) Alan said
{quote}
"It rests on the assumption that data bags generally live about the same amount
of time, thus there won't be a long lived databag at the head of the list
blocking the cleaning of many stale references later in the list."
{quote}
By looking at a line of code in SpillableMemoryManager
{noformat}
Collections.sort(spillables, new Comparator<WeakReference<Spillable>>() {
{noformat}
- Alan's assumption might be wrong after the memory manager tries to spill the
list.
- I don't understand why this has to be sorted and start spilling from the
smallest bags first. Most file systems are not good at handling small files
(specially ext2/ext3).
2) We use a linkedlist to maintain WeakReference. Normally a linkedlist
consumes double as much memory that an array would consume(for pointers).
Should it be better to change LinkedList to Array or ArrayList?
3) In SpillableMemoryManager, handleNotification which does a kind of I/O
intensive job shares the same lock with registerSpillable. This doesn't seem to
be efficient.
4) Sometimes I recognized that the bag currently in use got spilled and read
back over and over again. Essentially, the memory manager should consider
spilling bags currently not in use first.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.