[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778732#action_12778732 ]
Hadoop QA commented on PIG-872: ------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425174/PIG_872.patch against trunk revision 881008. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/157/console This message is automatically generated. > use distributed cache for the replicated data set in FR join > ------------------------------------------------------------ > > Key: PIG-872 > URL: https://issues.apache.org/jira/browse/PIG-872 > Project: Pig > Issue Type: Improvement > Reporter: Olga Natkovich > Assignee: Sriranjan Manjunath > Attachments: PIG_872.patch > > > Currently, the replicated file is read directly from DFS by all maps. If the > number of the concurrent maps is huge, we can overwhelm the NameNode with > open calls. > Using distributed cache will address the issue and might also give a > performance boost since the file will be copied locally once and the reused > by all tasks running on the same machine. > The basic approach would be to use cacheArchive to place the file into the > cache on the frontend and on the backend, the tasks would need to refer to > the data using path from the cache. > Note that cacheArchive does not work in Hadoop local mode. (Not a problem for > us right now as we don't use it.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.