[ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791396#action_12791396
 ] 

Jeff Zhang commented on PIG-1148:
---------------------------------

Pradeep, I do not quite understand your meaning.
I'd like to explain my idea again.  Now users have to user "split by 'file' " 
in pig-latin to force hadoop do not split file into InputSplit. I don't think 
it's a good idea to put too many features on pig-latin. The principle of load 
store redesign is to integrate the features of hadoop into pig as much as 
possible but do not want to tie Pig Latin tightly to Hadoop.
So my suggestion is that,  if pig do not want to split a file, he can provide a 
LoadFunc whose InputFormat control the splitable, this InputFormat extends 
FileInputFormat and override method isSplitable(FileSystem fs, Path filename) 
to control the splitable.

here's the code snippet illustrating my idea:

{code}
public class MyPigStorage extends PigStorage{
     @Override
    public InputFormat getInputFormat() {
        return new MyInputFormat();
    }
}

public class MyInputFormat extends TextInputFormat{
   protected boolean isSplitable(FileSystem fs, Path filename) {
    return false;
}
{code}

> Move splitable logic from pig latin to InputFormat
> --------------------------------------------------
>
>                 Key: PIG-1148
>                 URL: https://issues.apache.org/jira/browse/PIG-1148
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to