[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787081#action_12787081 ] Alan Gates commented on PIG-1053: - Yes, there is a lot of code to remove. We figured once we got the local map reduce mode working we could remove the extra code as we have time. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786948#action_12786948 ] Jeff Zhang commented on PIG-1053: - Since the removal of pig's own local mode, I think there's still some code clean up work needs to do. A lot of code related with the original pig local mode needs to be removed. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Fix For: 0.7.0 > > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779534#action_12779534 ] Olga Natkovich commented on PIG-1053: - Looks like the patch got resubmitted after my review - what has changed? > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779532#action_12779532 ] Olga Natkovich commented on PIG-1053: - release audit failures are in html. I will be committing the patch later today > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779344#action_12779344 ] Hadoop QA commented on PIG-1053: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425289/hadoopLocal.patch against trunk revision 881008. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 356 release audit warnings (more than the trunk's current 354 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/console This message is automatically generated. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779252#action_12779252 ] Ankit Modi commented on PIG-1053: - PhysicalPlan in local mode had POCounter Operator before every POStore. This operator was used for getting stats. As we moved to Hadoop this operator is no longer used. Hence the plan size changed. So the numbers changed. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779249#action_12779249 ] Hadoop QA commented on PIG-1053: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425265/hadoopLocal.patch against trunk revision 881008. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 356 release audit warnings (more than the trunk's current 354 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/console This message is automatically generated. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779248#action_12779248 ] Olga Natkovich commented on PIG-1053: - A few questions: TestMultiQueryLocal.java: I see that some of the calls changed like example below. What is the reason for that? -PhysicalPlan pp = checkPhysicalPlan(lp, 1, 3, 17); +PhysicalPlan pp = checkPhysicalPlan(lp, 1, 3, 14); Same question regarding changes in TestForEachNestedPlanLocal.java. My expectation was that things would not change. The rest looks good. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779245#action_12779245 ] Olga Natkovich commented on PIG-1053: - I will be reviewing this patch > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779243#action_12779243 ] Pradeep Kamath commented on PIG-1053: - Wondering if we can catch the scenario explained in the previous comment and present and error message to the effect - "custom comparators are not supported in local mode" > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779181#action_12779181 ] Ankit Modi commented on PIG-1053: - This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does not affect MapReduce mode ). Details: Pig uses custom Comparators by setting OutputKeyComparator to the customComparator.class, and passing the jar path to JVM while starting the task. In this new local mode a new JVM is not started. So hadoop does not have the classpath of customComparator and fails. A solution for the above problem would be to pass jarpath of customComparator in the "classpath" argument to JVM running pig. eg. {code:title=CustomComparatorUse.pig} register custom.jar A = load 'file'; B = order A by * using custompackage.customclass; --- Here hadoop bails out giving ClassNotFoundException store B into 'file2' {code} JVM Command {{java -cp pig.jar org.pig.apache.Main -x local CustomComparatorUse.pig # This does not work}} Use this instead {{java -cp pig.jar:{color:red}custom.jar{color} org.pig.apache.Main -x local CustomComparatorUse.pig}} > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779179#action_12779179 ] Ankit Modi commented on PIG-1053: - This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does not affect MapReduce mode ). Details: Pig uses custom Comparators by setting OutputKeyComparator to the customComparator.class, and passing the jar path to JVM while starting the task. In this new local mode a new JVM is not started. So hadoop does not have the classpath of customComparator and fails. A solution for the above problem would be to pass jarpath of customComparator in the "classpath" argument to JVM running pig. eg. CustomComparatorUse.pig register custom.jar A = load 'file';B = order a by * using custompackage.customclass; -- Here hadoop >> bails out giving ClassNotFoundException > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ankit Modi > Attachments: hadoopLocal.patch > > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772144#action_12772144 ] Alan Gates commented on PIG-1053: - For testing purposes we could simply change Main to tell PigContext the mode is MapReduce, even when the user selects local mode. Assuming there are no configuration files in the classpath, this will result in using Hadoop in the local mode. However, for a real fix, we need to make sure that when the user says "-x local" Hadoop's LocalJobRunner and the local file system are chosen even if there are configuration files in the classpath. I believe this would be accomplished by changing PigContext to in local mode still connect to MR and HDFS, but to do so with an empty Properties object rather than using the one that is passed in. This would affect connect, init, setJobTrackerLocation, and perhaps other calls. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770339#action_12770339 ] Dmitriy V. Ryaboy commented on PIG-1053: +1 Although I do know of one user who only utilized Local Mode. He didn't have big data, but found Pig Latin to be the best fit for his particular problem, due to support for nested structures. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770336#action_12770336 ] Raghu Angadi commented on PIG-1053: --- a big +1. It is understandable from PIG developer's point of view to be annoyed by beginners complaining about run time with toy local inputs. may be clear heads-up in tutorial would reduce those. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770314#action_12770314 ] Jeff Zhang commented on PIG-1053: - Agree, I always use hadoop's local mode for debug, it is more similar to hadoop's map reduce mode than pig's own local mode. I agree with you that adding pig's own local mode will increase the complexity of pig and cost of maintenance. I think few people will use pig's local mode in production, users only use it for debugging. So performance is not a big problem. The similarity to real map reduce mode is more important. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770237#action_12770237 ] Alan Gates commented on PIG-1053: - Currently Pig has its own backend implementation framework that it uses for executing Pig Latin scripts on a single box (as opposed to in a Hadoop cluster), referred to as local mode. Having a separate implementation has several drawbacks: 1) It does not offer the same functionality as Hadoop. A number of things do not work, such as counters, slicers, etc. 2) UDFs (both eval and load/store functions) are often forced to understand both contexts, and test whether they are working in local or hadoop mode. 3) Additional code maintenance, as Pig is forced to maintain its own framework. Going forward, as Pig attempts to leverage more Map Reduce functionality (see for example PIG-966) maintaining this separate mode is becoming a larger and larger effort. 4) It makes debugging harder for users and UDF writers, as the execution environment on a local box differs from that on the production cluster. Pig's local mode has one very serious advantage over Hadoop in local mode. It is much faster, about 15 times faster. Hadoop is designed for large data sets and thus is not optimized to handle the start up and tear down involved in small data jobs. For debugging of code, this performance factor should not be that big an issue. Where the performance becomes prohibitive is functionality like ILLUSTRATE. Taking 30 seconds to give a sample of data running through your script is excessive compared to 2 seconds. So, which of these pain points is worse? Originally we felt the performance was more important. But as we see many user complaints about the above listed drawbacks and relatively few users using local mode in performance intensive ways, we are wondering if we made that choice correctly. Please give your feedback one way or another. > Consider moving to Hadoop for local mode > > > Key: PIG-1053 > URL: https://issues.apache.org/jira/browse/PIG-1053 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates > > We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.