[ 
https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900332#action_12900332
 ] 

Thejas M Nair commented on PIG-1447:
------------------------------------

bq. Did you see any perf improvement? 
No, the query is the same and the performance is the same, just that the number 
of records reported earlier were not correct. Infact there was also a mistake 
in the calculation, i have fixed that in updated patch for PIG-1524 .

I made further modifications to the L15_modified.pig to use larger columns - 
L15_modified2.pig (attached). With this query the number of records dumped are 
17.5 million with 0.1f and 20 million  with 0.2f for pig.cachedbag.memusage . 
The records are also much larger in size . I see around 10% improvement with 
0.2f .

Considering the issue in PIG-1544 and that multi-query optimized queries can 
have large number of bags, I think it is safer to leave the value at 10% for 
now. We can add documentation on adjusting the value of this property so that 
users can adjust it if they see lot of records being proactive-spilled .

We should revisit this once PIG-1544 is fixed.

> Tune memory usage of InternalCachedBag
> --------------------------------------
>
>                 Key: PIG-1447
>                 URL: https://issues.apache.org/jira/browse/PIG-1447
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: L15_modified.pig, L15_modified2.pig
>
>
> We need to find a better value for "pig.cachedbag.memusage".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to