FilterFunc calls empty constructor when it should be calling parameterized 
constructor
--------------------------------------------------------------------------------------

                 Key: PIG-546
                 URL: https://issues.apache.org/jira/browse/PIG-546
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Viraj Bhat
             Fix For: types_branch


The following piece of Pig Script uses a custom UDF known as FILTERFROMFILE 
which extends the FilterFunc. It contains two constructors, an empty 
constructor which is mandatory and the parameterized constructor. The 
parameterized constructor  passes the HDFS filename, which the exec function 
uses to construct a HashMap. The HashMap is later used for filtering records 
based on the match criteria in the HDFS file.
{code}
register util.jar;
--util.jar contains the FILTERFROMFILE class

define FILTER_CRITERION util.FILTERFROMFILE('/user/viraj/insetfilterfile');

RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);

FILTERED_LOGS = filter RAW_LOGS by FILTER_CRITERION(numvisits);

dump FILTERED_LOGS;
{code}

When you execute the above script,  it results in a single Map only job with 1 
Map. It seems that the empty constructor is called 5 times, and ultimately 
results in failure of the job.
===========================================
parameterized constructor: /user/viraj/insetfilterfile
parameterized constructor: /user/viraj/insetfilterfile
empty constructor
empty constructor
empty constructor
empty constructor
empty constructor
===========================================
Error in the Hadoop backend
===========================================
java.lang.IllegalArgumentException: Can not create a Path from an empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
        at org.apache.hadoop.fs.Path.(Path.java:90)
        at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:199)
        at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:130)
        at 
org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:164)
        at util.FILTERFROMFILE.init(FILTERFROMFILE.java:70)
        at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:89)
        at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:52)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:179)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:217)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
===========================================
Attaching the sample data and the filter function UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to