[
https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786945#action_12786945
]
Jeff Zhang commented on PIG-1130:
-
According this issue, I'd like to know does pig have a clear definition of what
is local mode and what is mapreduce mode. Sometimes mapreduce mode behavior the
same as local mode, I mean even when users create a PigServer like this:
{code} PigServer pig = new PigServer(ExecType.MAPREDUCE); {code}
It will still run in local mode if there's no cluster configuration in
classpath. That means there's overlap between these two modes. But some
logical in pig such as accumulating pigstats is determined by the ExecType,
not by the real cluster mode.
So my suggestion is that we should define clearly what is local mode and what
is mapreduce mode.
{bold}My propose is as following:{bold}
local mode means hadoop standalone mode
mapreduce mode includes the Pseudo-Distributed hadoop cluster and
Fully-Distributed hadoop cluster. So if pig do not find specified cluster
configuration in classpath, it should throw exception and exit, rather than run
it in standalone hadoop mode.
then a lot of logics in pig can been determined by the ExecType, because
there's no overlap between these two modes.
In pig local ( hadoop local mode ) mode the counting of number of tuples and
bytes is incorrect if data is more than one local split.
-
Key: PIG-1130
URL: https://issues.apache.org/jira/browse/PIG-1130
Project: Pig
Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor
If the output generates more than one part file, the current code only gives
stats of the first part file. ie. part-0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.