I encountered a fairly strange issue with PIG the other day. I was trying
out the sample Swap UDF (Attached to this email) from PIG documentation:
http://wiki.apache.org/pig/UDFManual (Schema section)
I then tried to run the following script:
REGISTER
/home/arovner/Documents/workspace-sts-2.6.0.RELEASE/pig-bank/target/pig-bank-0.0.1-SNAPSHOT-jar-with-dependencies.jar;
A = LOAD '/wec/incoming' USING PigStorage() AS (timestamp:chararray,
ip:chararray, country:chararray, state:chararray, event:chararray,
url:chararray, agent:chararray, geo_country:chararray, geo_dma:chararray,
geo_region:chararray, geo_city:chararray, geo_zip:chararray,
browser:chararray, os:chararray, uuid:chararray, segment_id:chararray,
guid:chararray, action:chararray);
B = FOREACH A {
generate myudfs.Swap(geo_region, geo_zip).geo_zip;
}
STORE B INTO '/wec/output' USING PigStorage();
I would expect to see in the output only the information contained in
"geo_zip" but instead I see the following:
(10019,NY)
(10019,NY)
(10019,NY)
Why is PIG not selecting a specific field but instead spitting out the whole
tuple?
I am using pig 0.80 from Clouderas chd3u0 package.
Thanks
Alex
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.data.DataType;
public class Swap extends EvalFunc<Tuple> {
public Tuple exec(Tuple input) throws IOException {
if (input == null || input.size() < 2)
return null;
try{
Tuple output = TupleFactory.getInstance().newTuple(2);
output.set(0, input.get(1));
output.set(1, input.get(0));
return output;
} catch(Exception e){
System.err.println("Failed to process input; error - " + e.getMessage());
return null;
}
}
public Schema outputSchema(Schema input) {
try{
Schema tupleSchema = new Schema();
tupleSchema.add(input.getField(1));
tupleSchema.add(input.getField(0));
return new Schema(new Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(), input),
tupleSchema, DataType.TUPLE));
}catch (Exception e){
return null;
}
}
}