[
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781560#action_12781560
]
Hadoop QA commented on PIG-872:
-------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12425805/PIG_872.patch.1
against trunk revision 882818.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/console
This message is automatically generated.
> use distributed cache for the replicated data set in FR join
> ------------------------------------------------------------
>
> Key: PIG-872
> URL: https://issues.apache.org/jira/browse/PIG-872
> Project: Pig
> Issue Type: Improvement
> Reporter: Olga Natkovich
> Assignee: Sriranjan Manjunath
> Attachments: PIG_872.patch.1
>
>
> Currently, the replicated file is read directly from DFS by all maps. If the
> number of the concurrent maps is huge, we can overwhelm the NameNode with
> open calls.
> Using distributed cache will address the issue and might also give a
> performance boost since the file will be copied locally once and the reused
> by all tasks running on the same machine.
> The basic approach would be to use cacheArchive to place the file into the
> cache on the frontend and on the backend, the tasks would need to refer to
> the data using path from the cache.
> Note that cacheArchive does not work in Hadoop local mode. (Not a problem for
> us right now as we don't use it.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.