[
https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Ding updated PIG-1548:
------------------------------
Attachment: PIG-1548_1.patch
The patch excludes some multiquery cases where more information is needed to
correlate and determine the files to consolidate. We'll consider those cases in
a separate jira.
> Optimize scalar to consolidate the part file
> --------------------------------------------
>
> Key: PIG-1548
> URL: https://issues.apache.org/jira/browse/PIG-1548
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Daniel Dai
> Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1548.patch, PIG-1548_1.patch
>
>
> Current scalar implementation will write a scalar file onto dfs. When Pig
> need the scalar, it will open the dfs file directly. Each scalar file
> contains more than one part file though it contains only one record. This
> puts a huge load to namenode. We should consolidate part file before open it.
> Another optional step is put the consolicated file into distributed cache.
> This further bring down the load of namenode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.