[ 
https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-594:
---------------------------

    Attachment: INSETFROMFILE.java

INSETFROMFILE UDF which uses FilterFunc

> Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach 
> statements
> ---------------------------------------------------------------------------------------
>
>                 Key: PIG-594
>                 URL: https://issues.apache.org/jira/browse/PIG-594
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Viraj Bhat
>             Fix For: types_branch
>
>         Attachments: insetfilterfile, INSETFROMFILE.java, myurldata.txt
>
>
> I have a UDF known as INSETFROMFILE, which matches data against a set of 
> values stored in an HDFS file. The  INSETFROMFILE extends FilterFunc. Here is 
> a sample pig script which uses it.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = group A by (url);
> C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A;
> dump C;
> {code}
> This script fails with the following exception in the reducer:
> ================================================================================================================
> java.lang.NullPointerException 
>  at 
> org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45)
>         at util.INSETFROMFILE.init(INSETFROMFILE.java:79)
>         at util.INSETFROMFILE.exec(INSETFROMFILE.java:99)
>         at util.INSETFROMFILE.exec(INSETFROMFILE.java:61)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259)
>         at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
>         at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ================================================================================================================
> To avoid this error we use the INSETFROMFILE UDF in the Filter statement of 
> Pig and it works.
> {code}
> register util.jar;
> define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile');
> A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie);
> B = filter A by InQuerySet(bcookie);
> dump B;
> {code}
> The result is:
> (www.yahoo.com,12344)
> Problems:
> 1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour 
> when used in the FOREACH?
> 2) Is there a rule that FilterFunc UDF should be used in Filter statement?
> 3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is 
> null when the FilterFunc UDF is called within ForEach
> Attaching data and script file for testing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to