Please extend FieldSchema class with getSchema() member function for iterating 
over complex Schemas in Pig UDF outputSchema

                 Key: PIG-575
             Project: Pig
          Issue Type: Improvement
            Reporter: David Ciemiewicz

I have discovered that it is not possible to recurse through parts of the input 
Schema in the UDF outputSchema function.

I have a function that operates on an input bag of tuples and then creates 
sequential pairings of the rows.

A = foreach One generate { 
( 1, a ),
( 2, b )
}   as  bag { tuple ( seq: int, value: chararray ) };

The output of the PAIRS(A) should be:

( ( 1, a ), ( 2, b ) ),
( ( 2, b ), ( null, null ) )

The default output schema for the function should be:

bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, 
value: chararray ) ) ) }

The problem I have is that I'm not able to recurse into the internal Schema of 
the FieldSchema in my outputSchema function to get at the tuple within the 
input bag.

Here's my sample outputSchema for PAIRS:

    public Schema outputSchema(Schema input) {
        try {
        System.out.println("input: " + input.toString());

        Schema databagSchema = new Schema();
        Schema tupleSchema = new Schema();

        Schema inputDataBag = new Schema(input.getFields().get(0));
        System.out.println("inputDataBag: " + 

//  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
        Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // 
Here's where I want to say  
        System.out.println("inputTuple: " + inputTuple.toString());

        databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
        System.out.println("databagSchema: " + databagSchema.toString());

        return new Schema(
            new Schema.FieldSchema(
                getSchemaName( this.getClass().getName().toLowerCase(), input),
        } catch (Exception e) {
                return null;

Here's the execution output from outputSchema:

input: {A: {seq: int,value: chararray},int,int}
inputDataBag: A: bag({seq: int,value: chararray})
inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( 
seq: int, value: chararray )
rowSchema: A: bag({seq: int,value: chararray})
rowSchema: A: bag({seq: int,value: chararray})

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to