[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2009-12-18 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792770#action_12792770
 ] 

Jeff Zhang commented on PIG-1130:
-

Alan, I think one method is to check the type of FileSystem, if it is 
LocalFileSystem in MapReduce mode, then we should throw Exception. 

 In pig local ( hadoop local mode ) mode the counting of number of tuples and 
 bytes is incorrect if data is more than one local split.
 -

 Key: PIG-1130
 URL: https://issues.apache.org/jira/browse/PIG-1130
 Project: Pig
  Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor

 If the output generates more than one part file, the current code only gives 
 stats of the first part file. ie. part-0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2009-12-07 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786945#action_12786945
 ] 

Jeff Zhang commented on PIG-1130:
-

According this issue, I'd like to know does pig have a clear definition of what 
is local mode and what is mapreduce mode. Sometimes mapreduce mode behavior the 
same as local mode, I mean even when users create a PigServer like this: 
{code} PigServer pig = new PigServer(ExecType.MAPREDUCE); {code}
It will still run in local mode if there's no cluster configuration in 
classpath.  That means there's overlap between these two modes. But some 
logical in pig such as accumulating pigstats  is determined by the ExecType, 
not by the real cluster mode.

So my suggestion is that we should define clearly what is local mode and what 
is mapreduce mode.
{bold}My propose is as following:{bold}
local mode means hadoop standalone mode
mapreduce mode includes the Pseudo-Distributed hadoop cluster and 
Fully-Distributed hadoop cluster. So if pig do not find specified cluster 
configuration in classpath, it should throw exception and exit, rather than run 
it in standalone hadoop mode.

then a lot of logics in pig can been determined by the ExecType, because 
there's no overlap between these two modes.



 In pig local ( hadoop local mode ) mode the counting of number of tuples and 
 bytes is incorrect if data is more than one local split.
 -

 Key: PIG-1130
 URL: https://issues.apache.org/jira/browse/PIG-1130
 Project: Pig
  Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor

 If the output generates more than one part file, the current code only gives 
 stats of the first part file. ie. part-0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.