[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-12-07 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787081#action_12787081
 ] 

Alan Gates commented on PIG-1053:
-

Yes, there is a lot of code to remove.  We figured once we got the local map 
reduce mode working we could remove the extra code as we have time.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-12-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786948#action_12786948
 ] 

Jeff Zhang commented on PIG-1053:
-

Since the removal of pig's own local mode, I think there's still some code 
clean up work needs to do. A lot of code related with the original pig local 
mode needs to be removed.  

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Fix For: 0.7.0
>
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779534#action_12779534
 ] 

Olga Natkovich commented on PIG-1053:
-

Looks like the patch got resubmitted after my review - what has changed?

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779532#action_12779532
 ] 

Olga Natkovich commented on PIG-1053:
-

release audit failures are in html. I will be committing the patch later today

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779344#action_12779344
 ] 

Hadoop QA commented on PIG-1053:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425289/hadoopLocal.patch
  against trunk revision 881008.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 22 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 356 release audit warnings 
(more than the trunk's current 354 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/159/console

This message is automatically generated.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779252#action_12779252
 ] 

Ankit Modi commented on PIG-1053:
-

PhysicalPlan in local mode had POCounter Operator before every POStore. This 
operator was used for getting stats.

As we moved to Hadoop this operator is no longer used. Hence the plan size 
changed. So the numbers changed. 

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779249#action_12779249
 ] 

Hadoop QA commented on PIG-1053:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425265/hadoopLocal.patch
  against trunk revision 881008.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 22 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 356 release audit warnings 
(more than the trunk's current 354 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/158/console

This message is automatically generated.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779248#action_12779248
 ] 

Olga Natkovich commented on PIG-1053:
-

A few questions:

TestMultiQueryLocal.java: I see that some of the calls changed like example 
below. What is the reason for that?

-PhysicalPlan pp = checkPhysicalPlan(lp, 1, 3, 17);
+PhysicalPlan pp = checkPhysicalPlan(lp, 1, 3, 14);

Same question regarding changes in TestForEachNestedPlanLocal.java. My 
expectation was that things would not change.

The rest looks good. 


> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779245#action_12779245
 ] 

Olga Natkovich commented on PIG-1053:
-

I will be reviewing this patch

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779243#action_12779243
 ] 

Pradeep Kamath commented on PIG-1053:
-

Wondering if we can catch the scenario explained in the previous comment and 
present and error message to the effect - "custom comparators are not supported 
in local mode"

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779181#action_12779181
 ] 

Ankit Modi commented on PIG-1053:
-

This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does 
not affect MapReduce mode ).

Details:
Pig uses custom Comparators by setting OutputKeyComparator to the 
customComparator.class, and passing the jar path to JVM while starting the task.
In this new local mode a new JVM is not started. So hadoop does not have the 
classpath of customComparator and fails.

A solution for the above problem would be to pass jarpath of customComparator 
in the "classpath" argument to JVM running pig.

eg.
{code:title=CustomComparatorUse.pig}
register custom.jar
A = load 'file';
B = order A by * using custompackage.customclass; --- Here hadoop bails out 
giving ClassNotFoundException
store B into 'file2'
{code}

JVM Command
{{java -cp pig.jar org.pig.apache.Main -x local CustomComparatorUse.pig # This 
does not work}}

Use this instead
{{java -cp pig.jar:{color:red}custom.jar{color} org.pig.apache.Main -x local 
CustomComparatorUse.pig}}

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-11-17 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779179#action_12779179
 ] 

Ankit Modi commented on PIG-1053:
-

This patch has an issue with custom comparators ( OrderBy) in Local Mode ( Does 
not affect MapReduce mode ).

Details:
Pig uses custom Comparators by setting OutputKeyComparator to the 
customComparator.class, and passing the jar path to JVM while starting the task.
In this new local mode a new JVM is not started. So hadoop does not have the 
classpath of customComparator and fails.

A solution for the above problem would be to pass jarpath of customComparator 
in the "classpath" argument to JVM running pig.

eg. CustomComparatorUse.pig
register custom.jar

A = load 'file';B = order a by * using custompackage.customclass; -- Here hadoop
>> bails out giving ClassNotFoundException

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>Assignee: Ankit Modi
> Attachments: hadoopLocal.patch
>
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772144#action_12772144
 ] 

Alan Gates commented on PIG-1053:
-

For testing purposes we could simply change Main to tell PigContext the mode is 
MapReduce, even when the user selects local mode.  Assuming there are no 
configuration files in the classpath, this will result in using Hadoop in the 
local mode.

However, for a real fix, we need to make sure that when the user says "-x 
local" Hadoop's LocalJobRunner and the local file system are chosen even if 
there are configuration files in the classpath.  I believe this would be 
accomplished by changing PigContext to in local mode still connect to MR and 
HDFS, but to do so with an empty Properties object rather than using the one 
that is passed in.  This would affect connect, init, setJobTrackerLocation, and 
perhaps other calls.



> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770339#action_12770339
 ] 

Dmitriy V. Ryaboy commented on PIG-1053:


+1

Although I do know of one user who only utilized Local Mode.  He didn't have 
big data, but found Pig Latin to be the best fit for his particular problem, 
due to support for nested structures.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770336#action_12770336
 ] 

Raghu Angadi commented on PIG-1053:
---

a big +1.

It is understandable from PIG developer's point of view to be annoyed by 
beginners complaining about run time with toy local inputs. may be clear 
heads-up in tutorial would reduce those.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770314#action_12770314
 ] 

Jeff Zhang commented on PIG-1053:
-

Agree,

I always use hadoop's local mode for debug, it is more similar to hadoop's map 
reduce mode than pig's own local mode.
I agree with you that adding pig's own local mode will increase the complexity 
of pig and cost of maintenance.

I think few people will use pig's local mode in production, users only use it 
for debugging. So performance is not a big problem. The similarity to real map 
reduce mode is more important.



> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770237#action_12770237
 ] 

Alan Gates commented on PIG-1053:
-

Currently Pig has its own backend implementation framework that it uses for 
executing Pig Latin scripts on a single box (as opposed to in a Hadoop 
cluster), referred to as local mode.  Having a separate implementation has 
several drawbacks:

1) It does not offer the same functionality as Hadoop.  A number of things do 
not work, such as counters, slicers, etc.
2) UDFs (both eval and load/store functions) are often forced to understand 
both contexts, and test whether they are working in local or hadoop mode.
3) Additional code maintenance, as Pig is forced to maintain its own framework. 
 Going forward, as Pig attempts to leverage more Map Reduce functionality (see 
for example PIG-966) maintaining this separate mode is becoming a larger and 
larger effort.
4) It makes debugging harder for users and UDF writers, as the execution 
environment on a local box differs from that on the production cluster.

Pig's local mode has one very serious advantage over Hadoop in local mode.  It 
is much faster, about 15 times faster.  Hadoop is designed for large data sets 
and thus is not optimized to handle the start up and tear down involved in 
small data jobs.

For debugging of code, this performance factor should not be that big an issue. 
 Where the performance becomes prohibitive is functionality like ILLUSTRATE.  
Taking 30 seconds to give a sample of data running through your script is 
excessive compared to 2 seconds.

So, which of these pain points is worse?  Originally we felt the performance 
was more important.  But as we see many user complaints about the above listed 
drawbacks and relatively few users using local mode in performance intensive 
ways, we are wondering if we made that choice correctly.  Please give your 
feedback one way or another.


> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.