[
https://issues.apache.org/jira/browse/PIG-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shravan Matthur Narayanamurthy updated PIG-369:
-----------------------------------------------
Status: Patch Available (was: Open)
Implemented the visit(LOCross) method in LogToPhyTranslationVisitor. This
mimics what we were doing in Pig-1.0. To summarize, the following script with
Cross will be converted as shown below:
{noformat}
A1 = load 'f1';
A2 = load 'f2';
.
.
.
An = load 'fn';
B = cross A1,A2,...,An;
{noformat}
{noformat}
A1 = load 'f1';
.
.
.
An = load 'fn';
B1 = foreach A1 generate flatten(GFCross('n','0')), flatten(*);
B2 = foreach A2 generate flatten(GFCross('n','1')), flatten(*);
.
.
.
Bn = foreach An generate flatten(GFCross('n','n-1')), flatten(*);
C = splgroup B1 by ($0,$1,..,$n-1) inner, B2 by ($0,$1,..,$n-1) inner, ..., Bn
by ($0,$1,..,$n-1) inner;
D = foreach C generate flatten($1), flatten($2), ..., flatten($n);
{noformat}
GFCross outputs a bag with n-tuples and the foreach flattens the bag attaches
them to the original tuples thus replicating each tuple.
The only difference from a normal pig script is the splgroup where the
local-rearrange has a slight modification. When it is processing a cross, it
removes the first n values from each value tuple which were attached to it by
the foreach and passes the correct tuple as value while retaining the first n
values as the key.
For ex, the foreach might produce (2,1,R,4) where (R,4) is the actual tuple &
(2,1) is one of the tuples in the GFCross output. The localrearrange here
arranges such tuples into keys and values by makeing (2,1) the key and (R,4)
the value.
So the patch has two changes: one to translator & the other to localrearrange.
> Filter does not allow udf as the filter operator and only allows
> ComparisonOperators
> ------------------------------------------------------------------------------------
>
> Key: PIG-369
> URL: https://issues.apache.org/jira/browse/PIG-369
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Pradeep Kamath
> Assignee: Shravan Matthur Narayanamurthy
> Fix For: types_branch
>
>
> The following pig script does not work:
> {code}
> register util.jar;
> define MyFilterSet util.FilterUdf('filter.txt');
> A = load 'simpletest' using PigStorage() as ( x, y );
> B = filter A by MyFilterSet(x);
> dump B;
> {code}
> The following error is seen:
> {noformat}
> java -cp pig.jar:$localc org.apache.pig.Main filter.pig
> 2008-08-07 17:59:37,663 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to hadoop file system at: localhost:9000
> 2008-08-07 17:59:37,748 [main] WARN org.apache.hadoop.fs.FileSystem -
> "localhost:9000" is a deprecated filesystem name. Use
> "hdfs://localhost:9000/" instead.
> 2008-08-07 17:59:38,035 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
> to map-reduce job tracker at: localhost:9001
> 2008-08-07 17:59:38,166 [main] WARN org.apache.hadoop.fs.FileSystem -
> "localhost:9000" is a deprecated filesystem name. Use
> "hdfs://localhost:9000/" instead.
> java.io.IOException: Unable to open iterator for alias: B
> [org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc]
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.setPlan(POFilter.java:179)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:592)
> at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:102)
> at org.apache.pig.impl.logicalLayer.LOFilter.visit(LOFilter.java:31)
> at
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:245)
> at org.apache.pig.PigServer.compilePp(PigServer.java:590)
> at org.apache.pig.PigServer.execute(PigServer.java:516)
> at org.apache.pig.PigServer.openIterator(PigServer.java:307)
> at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:258)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:175)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:82)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:302)
> Caused by: java.lang.ClassCastException:
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
> ... 15 more
> {noformat}
> I looked further and the issue seems to be in POFilter which only thinks of
> the filter operator as a ComparisonOperator and doesn't allow a UDF for
> filtering:
> {code}
> public void setPlan(PhysicalPlan plan) {
> this.plan = plan;
> comOp = (ComparisonOperator) (plan.getLeaves()).get(0);
> compOperandType = comOp.getOperandType();
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.