Order by fails with java.lang.String cannot be cast to
org.apache.pig.data.DataBag
----------------------------------------------------------------------------------
Key: PIG-1374
URL: https://issues.apache.org/jira/browse/PIG-1374
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.6.0, 0.7.0
Reporter: Viraj Bhat
Script loads data from BinStorage(), then flattens columns and then sorts on
the second column with order descending. The order by fails with the
ClassCastException
{code}
register loader.jar;
a = load 'c2' using BinStorage();
b = foreach a generate org.apache.pig.CCMLoader(*);
describe b;
c = foreach b generate flatten($0);
describe c;
d = order c by $1 desc;
dump d;
{code}
The sampling job fails with the following error:
===============================================================================================================
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.DataBag
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
===============================================================================================================
The schema for b, c and d are as follows:
b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}}
c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double}
d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double}
If we modify this script to order on the first column it seems to work
{code}
register loader.jar;
a = load 'c2' using BinStorage();
b = foreach a generate org.apache.pig.CCMLoader(*);
describe b;
c = foreach b generate flatten($0);
describe c;
d = order c by $0 desc;
dump d;
{code}
(gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493)
(ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138)
There is a workaround to do a projection before ORDER
{code}
register loader.jar;
a = load 'c2' using BinStorage();
b = foreach a generate org.apache.pig.CCMLoader(*);
describe b;
c = foreach b generate flatten($0);
describe c;
newc = foreach c generate $0 as uuid, $1 as velocity;
newd = order newc by velocity desc;
dump newd;
{code}
(gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493)
(ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138)
The schema for the Loader is as follows:
{code}
public Schema outputSchema(Schema input) {
try{
List<Schema.FieldSchema> list = new
ArrayList<Schema.FieldSchema>();
list.add(new Schema.FieldSchema("uuid",
DataType.CHARARRAY));
list.add(new Schema.FieldSchema("velocity",
DataType.DOUBLE));
Schema tupleSchema = new Schema(list);
Schema.FieldSchema tupleFs = new
Schema.FieldSchema("tuple", tupleSchema, DataType.TUPLE);
Schema bagSchema = new Schema(tupleFs);
bagSchema.setTwoLevelAccessRequired(true);
Schema.FieldSchema bagFs = new
Schema.FieldSchema("bag_of_tuples",bagSchema, DataType.BAG);
return new Schema(bagFs);
}catch (Exception e){
return null;
}
}
{code}
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira