Jeff Zhang commented on PIG-1130:

According this issue, I'd like to know does pig have a clear definition of what 
is local mode and what is mapreduce mode. Sometimes mapreduce mode behavior the 
same as local mode, I mean even when users create a PigServer like this: 
{code} PigServer pig = new PigServer(ExecType.MAPREDUCE); {code}
It will still run in local mode if there's no cluster configuration in 
classpath.  That means there's overlap between these two modes. But some 
logical in pig such as accumulating pigstats  is determined by the ExecType, 
not by the real cluster mode.

So my suggestion is that we should define clearly what is local mode and what 
is mapreduce mode.
{bold}My propose is as following:{bold}
local mode means hadoop standalone mode
mapreduce mode includes the Pseudo-Distributed hadoop cluster and 
Fully-Distributed hadoop cluster. So if pig do not find specified cluster 
configuration in classpath, it should throw exception and exit, rather than run 
it in standalone hadoop mode.

then a lot of logics in pig can been determined by the ExecType, because 
there's no overlap between these two modes.

> In pig local ( hadoop local mode ) mode the counting of number of tuples and 
> bytes is incorrect if data is more than one local split.
> -------------------------------------------------------------------------------------------------------------------------------------
>                 Key: PIG-1130
>                 URL: https://issues.apache.org/jira/browse/PIG-1130
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Ankit Modi
>            Priority: Minor
> If the output generates more than one part file, the current code only gives 
> stats of the first part file. ie. part-00000

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to