[ 
https://issues.apache.org/jira/browse/PIG-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Utkarsh Srivastava updated PIG-44:
----------------------------------

    Attachment: spilling1.patch

I now *adaptively* make the choice of the number of records to hold in memory. 
I start with holding 1000 records in memory. Once a low-memory condition is 
hit, I aim that the bag should not become more than 1% of jvm heap size 
(TARGET_IN_MEMORY_SIZE). When the bag spills to disk, I measure how many bytes 
were actually written. If the bytes written were  < TARGET_IN_MEMORY_SIZE, I 
accordingly increase the number of records to hold in memory, otherwise 
accordingly decrease it. 

Most of the patch is some refactoring that I did in the big data bag unit test. 
I tested with up to 10 million records, and it seems to work great. My heap 
size was the default 64M.



> Problem with spilling BigBags
> -----------------------------
>
>                 Key: PIG-44
>                 URL: https://issues.apache.org/jira/browse/PIG-44
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Olga Natkovich
>         Attachments: spilling.patch, spilling1.patch
>
>
> Currently, once we spill the bag, if no additional memory becomes available, 
> we would be spilling 1 record at a time because of the problem with the 
> logic. Short term, we will make a change to spill 100 records at a time. 
> Longer term, we need to try and drain the memory before doing so.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to