[ 
https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1447:
-------------------------------

    Attachment: L15_modified.pig

The quest for better value for a new default value for pig.cachedbag.memusage 
was prompted by changes in PIG-1443 and PIG-1492 . Before the changes made as 
part of those jiras, pig was underestimating the memory footprint of data.
In data of 'typical' sizes  (chararray/bytearray with less than 20 chars), the 
new memory size estimates can be upto 2 times the old version without any 
changes (0.6.0).

I tried running pig queries with max heap size setting for tasks as 1GB, and 
compared the use of 0.1f and 0.2f as values for pig.cachedbag.memusage. I ran 
pigmix v1  queries(L1-L12) ,  modified pigmix v1 that specifies types , and 
modified L15 query which has several distincts in a nested foreach statement.
Only queries L5, L7 and L15 had proactive spills. I see that the number of 
spills goes down with 0.2f as the value, but the total runtime is practically 
the same. 

(See PIG-1524 for more on spills currently reported )

|| query || spills with 0.1f || spills with 0.2f || 
| L5 (original pigmix) | 496k | 0 |
| L7 (original pigmix) | 82k | 0 |
| L5 (with types) | 609k | 82k |
| L7 (with types) | 128k | 0 |
| L15_modified (attached to jira) |  501k | 326k |


Some other factors to consider while determining a new value for this property -
- as a result of issue described in PIG-1544, all proactive-spill bags don't 
share the memory limit.
- the default value should be low enough, so that queries work fine in most 
cases. Expert users can tweak this to improve performance
- the value of 0.1f has been used for a long time (with old memory estimate 
formula), and seems to work for most cases.
- during the above tests, no other queries were running, so the disks were 
relatively free. 

I propose that we increase the default value to 0.15f accommodate for changes 
in memory size estimation so that the spill behavior is closer to what it has 
been with 0.6 and 0.7. 


> Tune memory usage of InternalCachedBag
> --------------------------------------
>
>                 Key: PIG-1447
>                 URL: https://issues.apache.org/jira/browse/PIG-1447
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: L15_modified.pig
>
>
> We need to find a better value for "pig.cachedbag.memusage".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to