[
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770237#action_12770237
]
Alan Gates commented on PIG-1053:
---------------------------------
Currently Pig has its own backend implementation framework that it uses for
executing Pig Latin scripts on a single box (as opposed to in a Hadoop
cluster), referred to as local mode. Having a separate implementation has
several drawbacks:
1) It does not offer the same functionality as Hadoop. A number of things do
not work, such as counters, slicers, etc.
2) UDFs (both eval and load/store functions) are often forced to understand
both contexts, and test whether they are working in local or hadoop mode.
3) Additional code maintenance, as Pig is forced to maintain its own framework.
Going forward, as Pig attempts to leverage more Map Reduce functionality (see
for example PIG-966) maintaining this separate mode is becoming a larger and
larger effort.
4) It makes debugging harder for users and UDF writers, as the execution
environment on a local box differs from that on the production cluster.
Pig's local mode has one very serious advantage over Hadoop in local mode. It
is much faster, about 15 times faster. Hadoop is designed for large data sets
and thus is not optimized to handle the start up and tear down involved in
small data jobs.
For debugging of code, this performance factor should not be that big an issue.
Where the performance becomes prohibitive is functionality like ILLUSTRATE.
Taking 30 seconds to give a sample of data running through your script is
excessive compared to 2 seconds.
So, which of these pain points is worse? Originally we felt the performance
was more important. But as we see many user complaints about the above listed
drawbacks and relatively few users using local mode in performance intensive
ways, we are wondering if we made that choice correctly. Please give your
feedback one way or another.
> Consider moving to Hadoop for local mode
> ----------------------------------------
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
> Issue Type: Improvement
> Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.