Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema ---------------------------------------------------------------------------------------------------------------------------
Key: PIG-575 URL: https://issues.apache.org/jira/browse/PIG-575 Project: Pig Issue Type: Improvement Reporter: David Ciemiewicz I have discovered that it is not possible to recurse through parts of the input Schema in the UDF outputSchema function. I have a function that operates on an input bag of tuples and then creates sequential pairings of the rows. A = foreach One generate { ( 1, a ), ( 2, b ) } as bag { tuple ( seq: int, value: chararray ) }; The output of the PAIRS(A) should be: { ( ( 1, a ), ( 2, b ) ), ( ( 2, b ), ( null, null ) ) } The default output schema for the function should be: bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray ) ) ) } The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema in my outputSchema function to get at the tuple within the input bag. Here's my sample outputSchema for PAIRS: public Schema outputSchema(Schema input) { try { System.out.println("input: " + input.toString()); Schema databagSchema = new Schema(); Schema tupleSchema = new Schema(); Schema inputDataBag = new Schema(input.getFields().get(0)); System.out.println("inputDataBag: " + input.getFields().get(0).toString()); // // RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema // Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0); // Here's where I want to say System.out.println("inputTuple: " + inputTuple.toString()); databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE)); System.out.println("databagSchema: " + databagSchema.toString()); return new Schema( new Schema.FieldSchema( getSchemaName( this.getClass().getName().toLowerCase(), input), databagSchema, DataType.BAG ) ); } catch (Exception e) { return null; } } Here's the execution output from outputSchema: input: {A: {seq: int,value: chararray},int,int} inputDataBag: A: bag({seq: int,value: chararray}) inputTuple: A: bag({seq: int,value: chararray}) <= what I want to see is ( seq: int, value: chararray ) rowSchema: A: bag({seq: int,value: chararray}) rowSchema: A: bag({seq: int,value: chararray}) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.