[ 
https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Ciemiewicz updated PIG-575:
---------------------------------

    Component/s: impl
       Priority: Minor  (was: Major)

> Please extend FieldSchema class with getSchema() member function for 
> iterating over complex Schemas in Pig UDF outputSchema
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-575
>                 URL: https://issues.apache.org/jira/browse/PIG-575
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: David Ciemiewicz
>            Priority: Minor
>
> I have discovered that it is not possible to recurse through parts of the 
> input Schema in the UDF outputSchema function.
> I have a function that operates on an input bag of tuples and then creates 
> sequential pairings of the rows.
> A = foreach One generate { 
> ( 1, a ),
> ( 2, b )
> }   as  bag { tuple ( seq: int, value: chararray ) };
> The output of the PAIRS(A) should be:
> {
> ( ( 1, a ), ( 2, b ) ),
> ( ( 2, b ), ( null, null ) )
> }
> The default output schema for the function should be:
> bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, 
> value: chararray ) ) ) }
> The problem I have is that I'm not able to recurse into the internal Schema 
> of the FieldSchema in my outputSchema function to get at the tuple within the 
> input bag.
> Here's my sample outputSchema for PAIRS:
>     public Schema outputSchema(Schema input) {
>         try {
>         System.out.println("input: " + input.toString());
>         Schema databagSchema = new Schema();
>         Schema tupleSchema = new Schema();
>         Schema inputDataBag = new Schema(input.getFields().get(0));
>         System.out.println("inputDataBag: " + 
> input.getFields().get(0).toString());
> //
> //  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
> //
>         Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // 
> Here's where I want to say  
>         System.out.println("inputTuple: " + inputTuple.toString());
>         databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
>         System.out.println("databagSchema: " + databagSchema.toString());
>         return new Schema(
>             new Schema.FieldSchema(
>                 getSchemaName( this.getClass().getName().toLowerCase(), 
> input),
>                 databagSchema,
>                 DataType.BAG
>             )
>         );
>         } catch (Exception e) {
>                 return null;
>         }
>     }
> Here's the execution output from outputSchema:
> input: {A: {seq: int,value: chararray},int,int}
> inputDataBag: A: bag({seq: int,value: chararray})
> inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( 
> seq: int, value: chararray )
> rowSchema: A: bag({seq: int,value: chararray})
> rowSchema: A: bag({seq: int,value: chararray})

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to