[
https://issues.apache.org/jira/browse/PIG-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Utkarsh Srivastava updated PIG-44:
----------------------------------
Attachment: spilling1.patch
I now *adaptively* make the choice of the number of records to hold in memory.
I start with holding 1000 records in memory. Once a low-memory condition is
hit, I aim that the bag should not become more than 1% of jvm heap size
(TARGET_IN_MEMORY_SIZE). When the bag spills to disk, I measure how many bytes
were actually written. If the bytes written were < TARGET_IN_MEMORY_SIZE, I
accordingly increase the number of records to hold in memory, otherwise
accordingly decrease it.
Most of the patch is some refactoring that I did in the big data bag unit test.
I tested with up to 10 million records, and it seems to work great. My heap
size was the default 64M.
> Problem with spilling BigBags
> -----------------------------
>
> Key: PIG-44
> URL: https://issues.apache.org/jira/browse/PIG-44
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Olga Natkovich
> Attachments: spilling.patch, spilling1.patch
>
>
> Currently, once we spill the bag, if no additional memory becomes available,
> we would be spilling 1 record at a time because of the problem with the
> logic. Short term, we will make a change to spill 100 records at a time.
> Longer term, we need to try and drain the memory before doing so.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.