[ 
https://issues.apache.org/jira/browse/PIG-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-546:
------------------------------------

    Attachment: PIG-546.patch

The patch (PIG-546.patch)  addresses the following issue(s):

1. Fixes the use of an alias declared via the define statement and the 
subsequent use in
   i. Filter functions
  ii. Load functions
  iii. Store functions
  iv. Order by functions
  v. Streaming specifications (input and output)

2. New unit test cases for the parser, end-to-end test cases for streaming and 
filter udf have been added.

Note: There are no end-to-end test cases for order by using a UDF.

All unit test cases pass.

> FilterFunc calls empty constructor when it should be calling parameterized 
> constructor
> --------------------------------------------------------------------------------------
>
>                 Key: PIG-546
>                 URL: https://issues.apache.org/jira/browse/PIG-546
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>             Fix For: types_branch
>
>         Attachments: FILTERFROMFILE.java, insetfilterfile, mydata.txt, 
> PIG-546.patch
>
>
> The following piece of Pig Script uses a custom UDF known as FILTERFROMFILE 
> which extends the FilterFunc. It contains two constructors, an empty 
> constructor which is mandatory and the parameterized constructor. The 
> parameterized constructor  passes the HDFS filename, which the exec function 
> uses to construct a HashMap. The HashMap is later used for filtering records 
> based on the match criteria in the HDFS file.
> {code}
> register util.jar;
> --util.jar contains the FILTERFROMFILE class
> define FILTER_CRITERION util.FILTERFROMFILE('/user/viraj/insetfilterfile');
> RAW_LOGS = load 'mydata.txt' as (url:chararray, numvisits:int);
> FILTERED_LOGS = filter RAW_LOGS by FILTER_CRITERION(numvisits);
> dump FILTERED_LOGS;
> {code}
> When you execute the above script,  it results in a single Map only job with 
> 1 Map. It seems that the empty constructor is called 5 times, and ultimately 
> results in failure of the job.
> ===========================================
> parameterized constructor: /user/viraj/insetfilterfile
> parameterized constructor: /user/viraj/insetfilterfile
> empty constructor
> empty constructor
> empty constructor
> empty constructor
> empty constructor
> ===========================================
> Error in the Hadoop backend
> ===========================================
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>       at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
>       at org.apache.hadoop.fs.Path.(Path.java:90)
>       at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:199)
>       at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:130)
>       at 
> org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:164)
>       at util.FILTERFROMFILE.init(FILTERFROMFILE.java:70)
>       at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:89)
>       at util.FILTERFROMFILE.exec(FILTERFROMFILE.java:52)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:179)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:217)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:170)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ===========================================
> Attaching the sample data and the filter function UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to