Order by fails with java.lang.String cannot be cast to org.apache.pig.data.DataBag ----------------------------------------------------------------------------------
Key: PIG-1374 URL: https://issues.apache.org/jira/browse/PIG-1374 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0, 0.7.0 Reporter: Viraj Bhat Script loads data from BinStorage(), then flattens columns and then sorts on the second column with order descending. The order by fails with the ClassCastException {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $1 desc; dump d; {code} The sampling job fails with the following error: =============================================================================================================== java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:329) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) =============================================================================================================== The schema for b, c and d are as follows: b: {bag_of_tuples: {tuple: (uuid: chararray,velocity: double)}} c: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} d: {bag_of_tuples::uuid: chararray,bag_of_tuples::velocity: double} If we modify this script to order on the first column it seems to work {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; d = order c by $0 desc; dump d; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) There is a workaround to do a projection before ORDER {code} register loader.jar; a = load 'c2' using BinStorage(); b = foreach a generate org.apache.pig.CCMLoader(*); describe b; c = foreach b generate flatten($0); describe c; newc = foreach c generate $0 as uuid, $1 as velocity; newd = order newc by velocity desc; dump newd; {code} (gc639c60-4267-11df-9879-0800200c9a66,2.4227339503478493) (ec639c60-4267-11df-9879-0800200c9a66,1.140175425099138) The schema for the Loader is as follows: {code} public Schema outputSchema(Schema input) { try{ List<Schema.FieldSchema> list = new ArrayList<Schema.FieldSchema>(); list.add(new Schema.FieldSchema("uuid", DataType.CHARARRAY)); list.add(new Schema.FieldSchema("velocity", DataType.DOUBLE)); Schema tupleSchema = new Schema(list); Schema.FieldSchema tupleFs = new Schema.FieldSchema("tuple", tupleSchema, DataType.TUPLE); Schema bagSchema = new Schema(tupleFs); bagSchema.setTwoLevelAccessRequired(true); Schema.FieldSchema bagFs = new Schema.FieldSchema("bag_of_tuples",bagSchema, DataType.BAG); return new Schema(bagFs); }catch (Exception e){ return null; } } {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira