[ https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786945#action_12786945 ]
Jeff Zhang commented on PIG-1130: --------------------------------- According this issue, I'd like to know does pig have a clear definition of what is local mode and what is mapreduce mode. Sometimes mapreduce mode behavior the same as local mode, I mean even when users create a PigServer like this: {code} PigServer pig = new PigServer(ExecType.MAPREDUCE); {code} It will still run in local mode if there's no cluster configuration in classpath. That means there's overlap between these two modes. But some logical in pig such as accumulating pigstats is determined by the ExecType, not by the real cluster mode. So my suggestion is that we should define clearly what is local mode and what is mapreduce mode. {bold}My propose is as following:{bold} local mode means hadoop standalone mode mapreduce mode includes the Pseudo-Distributed hadoop cluster and Fully-Distributed hadoop cluster. So if pig do not find specified cluster configuration in classpath, it should throw exception and exit, rather than run it in standalone hadoop mode. then a lot of logics in pig can been determined by the ExecType, because there's no overlap between these two modes. > In pig local ( hadoop local mode ) mode the counting of number of tuples and > bytes is incorrect if data is more than one local split. > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-1130 > URL: https://issues.apache.org/jira/browse/PIG-1130 > Project: Pig > Issue Type: Bug > Reporter: Ankit Modi > Priority: Minor > > If the output generates more than one part file, the current code only gives > stats of the first part file. ie. part-00000 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.