[ 
https://issues.apache.org/jira/browse/PIG-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pi Song updated PIG-176:
------------------------

    Attachment: pig_176_smallbags_v1.patch

This patch implements (1) Spill file size threshold  (2)My idea in the last 
comment

"spill.size.threshold" and "spill.gc.activation.size" are to be set as JVM 
parameters or .pigrc in order to use this new feature. Default values are 0 and 
Long.MAX_VALUE respectively.

There is a bit of problem in (1) that Bag.getMemorySize() sometimes doesn't 
return accurate value so even the threshold is set, it's still possible that 
files smaller than the threshold are created.

The configuration code is still messy in MapReduceLauncher. This needs a 
clean-up after the configuration patch gets in.

> pig creates many small files when it spills
> -------------------------------------------
>
>                 Key: PIG-176
>                 URL: https://issues.apache.org/jira/browse/PIG-176
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>         Attachments: pig_176_smallbags_v1.patch
>
>
> Currently, on spill pig can generate millions of small (under 128K) files. 
> Partially this is due to PIG-170 but even with that patch, you can still try 
> and spill small bags.
> The proposal is to not spill small files. Alan told me that the logic is 
> already there but we just need to bump the size limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to