You should get the warning in both cases for the same data. The reason for this warning message is that the data doesn't match the specified schema. More precisely, if the schema has 5 fields, but some records in the data set have only 4 fields, then accessing the 5th field in your script will generate this message. Pig doesn't stop the job, it just pads null for the missing fields and issue a warning.
Thanks, -Richard -----Original Message----- From: Carmelo Badalamenti [mailto:[email protected]] Sent: Wednesday, June 09, 2010 2:51 AM To: [email protected] Subject: ACCESSING_NON_EXISTENT_FIELD problem Hi all, I'm Carmelo Badalamenti, and I'm working with Pig with great satisfaction :) I have a problem, indeed... I Load a file into pig script like this: raw = LOAD 'filename' USING PigStorage(',') AS (fe:int,ts:chararray,time,uid,panel,set,type,[...cut...],invalid,f,g,h) ; As you can see I specify only sometimes the type of var. Doing this I have this warning: [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 580 time(s). Instead if I specify no types (for instance, like this: raw = LOAD 'filename' USING PigStorage(',') AS (fe,ts,time,uid,panel,set,type,[...cut...],invalid,f,g,h) ; ) I don't notice any warning. The problem is I need to specify at least the "chararray" cast because I wrote several user-defined-function that needs it... How I can solve this problem? Thanks in advance... Carmelo -- --------------------------------------------------------------------------------------------------- Carmelo Badalamenti aka RollsAppleTree mail: [email protected] ---------------------------------------------------------------------------------------------------
