Hi,
Has anyone seen the following?
I am getting an error when running ORDER:
ERROR 1071: Cannot convert a Unknown to a String
The error occurs in DataType.java:885. At the end of that switch
statement variable 'type' is -1, and variable 'o' is a string that looks
like a leftover from the prior statements A or B. The value of 'o' is:
%!PS-Adobe-2.0
%%Creator: dvips(k) 5.86 Copyright 1999 Radical Eye Software
%%Title: arXiv:astro-ph/0005123 v3 2 Oct 2000
%%Pages: 7
%%PageOrder: As...
Note that if I skip the ORDER statement, everything works, and looks
correct in the resulting file. Random order, of course.
The error does not occur if I make one simple change:
Store D into a tmp file, then LOAD that file and execute E without
any change to that statement.
Pseudocode below, followed by the stack trace.
A = LOAD 'foo' "
USING aLoader()
AS (url:chararray,
date:chararray,
pageSize:int,
position:int,
docidInCrawl:int,
httpHeader:chararray,
content:chararray);
B = FOREACH A GENERATE
udf();
-- B is of the form {(chararray,chararray,int), (chararray,chararray,int),
... }
D = FOREACH B GENERATE flatten($0) AS (token:chararray, docID:chararray,
tokenPos:int);
E = ORDER D BY token ASC;
STORE E INTO 'bar';
org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot
convert a Unknown to a String
at org.apache.pig.data.DataType.toString(DataType.java:885)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:642)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:367)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)