[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

Alan Gates (JIRA) Mon, 26 Oct 2009 15:19:25 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770237#action_12770237
 ]


Alan Gates commented on PIG-1053:
---------------------------------

Currently Pig has its own backend implementation framework that it uses for 
executing Pig Latin scripts on a single box (as opposed to in a Hadoop 
cluster), referred to as local mode.  Having a separate implementation has 
several drawbacks:

1) It does not offer the same functionality as Hadoop.  A number of things do 
not work, such as counters, slicers, etc.
2) UDFs (both eval and load/store functions) are often forced to understand 
both contexts, and test whether they are working in local or hadoop mode.
3) Additional code maintenance, as Pig is forced to maintain its own framework. 
 Going forward, as Pig attempts to leverage more Map Reduce functionality (see 
for example PIG-966) maintaining this separate mode is becoming a larger and 
larger effort.
4) It makes debugging harder for users and UDF writers, as the execution 
environment on a local box differs from that on the production cluster.

Pig's local mode has one very serious advantage over Hadoop in local mode.  It 
is much faster, about 15 times faster.  Hadoop is designed for large data sets 
and thus is not optimized to handle the start up and tear down involved in 
small data jobs.

For debugging of code, this performance factor should not be that big an issue. 
 Where the performance becomes prohibitive is functionality like ILLUSTRATE.  
Taking 30 seconds to give a sample of data running through your script is 
excessive compared to 2 seconds.

So, which of these pain points is worse?  Originally we felt the performance 
was more important.  But as we see many user complaints about the above listed 
drawbacks and relatively few users using local mode in performance intensive 
ways, we are wondering if we made that choice correctly.  Please give your 
feedback one way or another.


> Consider moving to Hadoop for local mode
> ----------------------------------------
>
>                 Key: PIG-1053
>                 URL: https://issues.apache.org/jira/browse/PIG-1053
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

Reply via email to