[ https://issues.apache.org/jira/browse/PIG-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1548: ------------------------------ Attachment: PIG-1548_1.patch The patch excludes some multiquery cases where more information is needed to correlate and determine the files to consolidate. We'll consider those cases in a separate jira. > Optimize scalar to consolidate the part file > -------------------------------------------- > > Key: PIG-1548 > URL: https://issues.apache.org/jira/browse/PIG-1548 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: Daniel Dai > Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1548.patch, PIG-1548_1.patch > > > Current scalar implementation will write a scalar file onto dfs. When Pig > need the scalar, it will open the dfs file directly. Each scalar file > contains more than one part file though it contains only one record. This > puts a huge load to namenode. We should consolidate part file before open it. > Another optional step is put the consolicated file into distributed cache. > This further bring down the load of namenode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.