[ https://issues.apache.org/jira/browse/PIG-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-594: --------------------------- Attachment: INSETFROMFILE.java INSETFROMFILE UDF which uses FilterFunc > Inconsistent behaviour of FilterFunc UDF when used in the Filter and ForEach > statements > --------------------------------------------------------------------------------------- > > Key: PIG-594 > URL: https://issues.apache.org/jira/browse/PIG-594 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: types_branch > Reporter: Viraj Bhat > Fix For: types_branch > > Attachments: insetfilterfile, INSETFROMFILE.java, myurldata.txt > > > I have a UDF known as INSETFROMFILE, which matches data against a set of > values stored in an HDFS file. The INSETFROMFILE extends FilterFunc. Here is > a sample pig script which uses it. > {code} > register util.jar; > define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile'); > A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie); > B = group A by (url); > C = foreach B generate ((InQuerySet(A.bcookie))?1:0) as inset, A; > dump C; > {code} > This script fails with the following exception in the reducer: > ================================================================================================================ > java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toProperties(ConfigurationUtil.java:45) > at util.INSETFROMFILE.init(INSETFROMFILE.java:79) > at util.INSETFROMFILE.exec(INSETFROMFILE.java:99) > at util.INSETFROMFILE.exec(INSETFROMFILE.java:61) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:223) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POBinCond.getNext(POBinCond.java:92) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:259) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:280) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:247) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209) > ================================================================================================================ > To avoid this error we use the INSETFROMFILE UDF in the Filter statement of > Pig and it works. > {code} > register util.jar; > define InQuerySet util.INSETFROMFILE('/user/viraj/insetfilterfile'); > A = load '/user/viraj/myurldata.txt' using PigStorage() as (url, bcookie); > B = filter A by InQuerySet(bcookie); > dump B; > {code} > The result is: > (www.yahoo.com,12344) > Problems: > 1) Why does the FilterFunc UDF, INSETFROMFILE show inconsistent behaviour > when used in the FOREACH? > 2) Is there a rule that FilterFunc UDF should be used in Filter statement? > 3) Properties props = ConfigurationUtil.toProperties(PigInputFormat.sJob) is > null when the FilterFunc UDF is called within ForEach > Attaching data and script file for testing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.