[ 
https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1548:
------------------------------

    Attachment: PIG-1548_1.patch

The patch excludes some multiquery cases where more information is needed to 
correlate and determine the files to consolidate. We'll consider those cases in 
a separate jira.  

> Optimize scalar to consolidate the part file
> --------------------------------------------
>
>                 Key: PIG-1548
>                 URL: https://issues.apache.org/jira/browse/PIG-1548
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1548.patch, PIG-1548_1.patch
>
>
> Current scalar implementation will write a scalar file onto dfs. When Pig 
> need the scalar, it will open the dfs file directly. Each scalar file 
> contains more than one part file though it contains only one record. This 
> puts a huge load to namenode. We should consolidate part file before open it. 
> Another optional step is put the consolicated file into distributed cache. 
> This further bring down the load of namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to