I encountered a fairly strange issue with PIG the other day.  I was trying
out the sample Swap UDF (Attached to this email) from PIG documentation:
http://wiki.apache.org/pig/UDFManual (Schema section)

I then tried to run the following script:

REGISTER
/home/arovner/Documents/workspace-sts-2.6.0.RELEASE/pig-bank/target/pig-bank-0.0.1-SNAPSHOT-jar-with-dependencies.jar;

A = LOAD '/wec/incoming' USING PigStorage() AS (timestamp:chararray,
ip:chararray, country:chararray, state:chararray, event:chararray,
url:chararray, agent:chararray, geo_country:chararray, geo_dma:chararray,
geo_region:chararray, geo_city:chararray, geo_zip:chararray,
browser:chararray, os:chararray, uuid:chararray, segment_id:chararray,
guid:chararray, action:chararray);

B = FOREACH A {
generate myudfs.Swap(geo_region, geo_zip).geo_zip;
}

STORE B INTO '/wec/output' USING PigStorage();


I would expect to see in the output only the information contained in
"geo_zip" but instead I see the following:

(10019,NY)
(10019,NY)
(10019,NY)

Why is PIG not selecting a specific field but instead spitting out the whole
tuple?

I am using pig 0.80 from Clouderas chd3u0 package.

Thanks
Alex
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.data.DataType;

public class Swap extends EvalFunc<Tuple> {
    public Tuple exec(Tuple input) throws IOException {
        if (input == null || input.size() < 2)
            return null;
        try{
            Tuple output = TupleFactory.getInstance().newTuple(2);
            output.set(0, input.get(1));
            output.set(1, input.get(0));
            return output;
        } catch(Exception e){
            System.err.println("Failed to process input; error - " + e.getMessage());
            return null;
        }
    }
    public Schema outputSchema(Schema input) {
        try{
            Schema tupleSchema = new Schema();
            tupleSchema.add(input.getField(1));
            tupleSchema.add(input.getField(0));
            return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input),
                                                                              tupleSchema, DataType.TUPLE));
        }catch (Exception e){
                return null;
        }
    }
}

Reply via email to