Hello,
I have a dataset with more than 180 columns to which I want to join (based on
two columns) to another.
I would like not to have to enumerate all the 180 column names in a schema.
What other options do I have?
Here is my script:
-- This has 180 columns which I do not want to explicitly declare
wide_data = LOAD '/wide/' USING PigStorage('\t');
DESCRIBE wide_data ;
narrow_data =
LOAD'/narrow/'
USING PigStorage('\t')
AS (
a : chararray,
b : chararray,
c : long,
d : double
);
narrow_data = FOREACH narrow_data GENERATE a, c, d ;
DESCRIBE narrow_data;
-- join based on two columns
j = JOIN wide_data BY ((chararray)$20, (long)$172), narrow_data BY (a, c)
PARALLEL 1800 ;
DESCRIBE j;
STORE j into '/output/';
====
When I execute pig -x, it complains because it does not know the schema:
Schema for wide_data unknown.
narrow_data: {a: chararray,c: long,d: double}
Schema for j unknown.
2010-11-18 15:58:03,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
2189: Expect schema
Details at logfile: /hadoop/home/amallya/JARs/pig_1290121082836.log
The log file says:
more pig_1290121082836.log
Pig Stack Trace
---------------
ERROR 2189: Expect schema
org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2185: Unable to
prune columns when processing node (Name: ForEach 1-72 Operator Key: 1-7
2)
at
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:515)
at
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.transform(PruneColumns.java:150)
at
org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:232)
at org.apache.pig.PigServer.compileLp(PigServer.java:857)
at org.apache.pig.PigServer.compileLp(PigServer.java:793)
at org.apache.pig.PigServer.execute(PigServer.java:762)
at org.apache.pig.PigServer.access$100(PigServer.java:90)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:952)
at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:386)
Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2185:
Unable to prune columns when processing node (Name: Load 1-47 Operator
Key: 1-47)
at
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:515)
at
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:510)
... 13 more
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2188: Cannot prune
columns for (Name: LOJoin 1-62 Operator Key: 1-62)
at
org.apache.pig.impl.logicalLayer.ColumnPruner.prune(ColumnPruner.java:226)
at
org.apache.pig.impl.logicalLayer.ColumnPruner.visit(ColumnPruner.java:251)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:206)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.pruneLoader(PruneColumns.java:762)
at
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:198)
... 14 more
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2189: Expect schema
at
org.apache.pig.impl.logicalLayer.ColumnPruner.prune(ColumnPruner.java:78)
... 21 more
Thanks
Ashok.