subject:"\[jira\] Commented\: \(PIG\-1130\) In pig local \( hadoop local mode \) mode the counting of number of tuples and bytes is incorrect if data is more than one local split."

[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2009-12-18 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792770#action_12792770
 ] 

Jeff Zhang commented on PIG-1130:
-

Alan, I think one method is to check the type of FileSystem, if it is 
LocalFileSystem in MapReduce mode, then we should throw Exception. 

 In pig local ( hadoop local mode ) mode the counting of number of tuples and 
 bytes is incorrect if data is more than one local split.
 -

 Key: PIG-1130
 URL: https://issues.apache.org/jira/browse/PIG-1130
 Project: Pig
  Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor

 If the output generates more than one part file, the current code only gives 
 stats of the first part file. ie. part-0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2009-12-07 Thread Jeff Zhang (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786945#action_12786945
]

Jeff Zhang commented on PIG-1130:
-

According this issue, I'd like to know does pig have a clear definition of what
is local mode and what is mapreduce mode. Sometimes mapreduce mode behavior the
same as local mode, I mean even when users create a PigServer like this:
{code} PigServer pig = new PigServer(ExecType.MAPREDUCE); {code}
It will still run in local mode if there's no cluster configuration in
classpath. That means there's overlap between these two modes. But some
logical in pig such as accumulating pigstats is determined by the ExecType,
not by the real cluster mode.

So my suggestion is that we should define clearly what is local mode and what
is mapreduce mode.
{bold}My propose is as following:{bold}
local mode means hadoop standalone mode
mapreduce mode includes the Pseudo-Distributed hadoop cluster and
Fully-Distributed hadoop cluster. So if pig do not find specified cluster
configuration in classpath, it should throw exception and exit, rather than run
it in standalone hadoop mode.

then a lot of logics in pig can been determined by the ExecType, because
there's no overlap between these two modes.

In pig local ( hadoop local mode ) mode the counting of number of tuples and
bytes is incorrect if data is more than one local split.
-

Key: PIG-1130
URL: https://issues.apache.org/jira/browse/PIG-1130
Project: Pig
Issue Type: Bug
Reporter: Ankit Modi
Priority: Minor

If the output generates more than one part file, the current code only gives
stats of the first part file. ie. part-0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.

2 matches

Site Navigation

Mail list logo

Footer information