[ 
https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834637#action_12834637
 ] 

Hadoop QA commented on PIG-1216:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436067/pig-1216_1.patch
  against trunk revision 909921.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/207/console

This message is automatically generated.

> New load store design does not allow Pig to validate inputs and outputs up 
> front
> --------------------------------------------------------------------------------
>
>                 Key: PIG-1216
>                 URL: https://issues.apache.org/jira/browse/PIG-1216
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Alan Gates
>            Assignee: Ashutosh Chauhan
>         Attachments: pig-1216.patch, pig-1216_1.patch
>
>
> In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
> non-existence of outputs during parsing to avoid run time failures when 
> inputs don't exist or outputs can't be overwritten.  The downside to this was 
> that Pig assumed all inputs and outputs were HDFS files, which made 
> implementation harder for non-HDFS based load and store functions.  In the 
> load store redesign (PIG-966) this was delegated to InputFormats and 
> OutputFormats to avoid this problem and to make use of the checks already 
> being done in those implementations.  Unfortunately, for Pig Latin scripts 
> that run more then one MR job, this does not work well.  MR does not do 
> input/output verification on all the jobs at once.  It does them one at a 
> time.  So if a Pig Latin script results in 10 MR jobs and the file to store 
> to at the end already exists, the first 9 jobs will be run before the 10th 
> job discovers that the whole thing was doomed from the beginning.  
> To avoid this a validate call needs to be added to the new LoadFunc and 
> StoreFunc interfaces.  Pig needs to pass this method enough information that 
> the load function implementer can delegate to InputFormat.getSplits() and the 
> store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
> to.  Since 90% of all load and store functions use HDFS and PigStorage will 
> also need to, the Pig team should implement a default file existence check on 
> HDFS and make it available as a static method to other Load/Store function 
> implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to