[ https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658625#action_12658625 ]
Santhosh Srinivasan commented on PIG-575: ----------------------------------------- The FiledSchema member variable schema is public. It can be accessed directly without the use of a getSchema() although having the method could make the code cleaner. > Please extend FieldSchema class with getSchema() member function for > iterating over complex Schemas in Pig UDF outputSchema > --------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-575 > URL: https://issues.apache.org/jira/browse/PIG-575 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: David Ciemiewicz > Priority: Minor > > I have discovered that it is not possible to recurse through parts of the > input Schema in the UDF outputSchema function. > I have a function that operates on an input bag of tuples and then creates > sequential pairings of the rows. > A = foreach One generate { > ( 1, a ), > ( 2, b ) > } as bag { tuple ( seq: int, value: chararray ) }; > The output of the PAIRS(A) should be: > { > ( ( 1, a ), ( 2, b ) ), > ( ( 2, b ), ( null, null ) ) > } > The default output schema for the function should be: > bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, > value: chararray ) ) ) } > The problem I have is that I'm not able to recurse into the internal Schema > of the FieldSchema in my outputSchema function to get at the tuple within the > input bag. > Here's my sample outputSchema for PAIRS: > public Schema outputSchema(Schema input) { > try { > System.out.println("input: " + input.toString()); > Schema databagSchema = new Schema(); > Schema tupleSchema = new Schema(); > Schema inputDataBag = new Schema(input.getFields().get(0)); > System.out.println("inputDataBag: " + > input.getFields().get(0).toString()); > // > // RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema > // > Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0); // > Here's where I want to say > System.out.println("inputTuple: " + inputTuple.toString()); > databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE)); > System.out.println("databagSchema: " + databagSchema.toString()); > return new Schema( > new Schema.FieldSchema( > getSchemaName( this.getClass().getName().toLowerCase(), > input), > databagSchema, > DataType.BAG > ) > ); > } catch (Exception e) { > return null; > } > } > Here's the execution output from outputSchema: > input: {A: {seq: int,value: chararray},int,int} > inputDataBag: A: bag({seq: int,value: chararray}) > inputTuple: A: bag({seq: int,value: chararray}) <= what I want to see is ( > seq: int, value: chararray ) > rowSchema: A: bag({seq: int,value: chararray}) > rowSchema: A: bag({seq: int,value: chararray}) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.