Hi,

I'm trying to read the Avro file i stored on HDFS, but I seem to be hitting a snag. I'm hoping some of you will be able to shed some light on this and allow me to continue my adventure!


REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';

DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();

avro = load '/import/2012-01-04-deflate.avro' USING AvroStorage();

groups = group avro by trace.terminalid;
sc = foreach groups generate group as terminalid, COUNT(avro) as cnt;

store sc into '/import/test-out.avro' USING AvroStorage();



The schema of the avro file:

{
    "type": "record",
    "name": "trace",
    "namespace": "asp",
    "fields": [
        {   "name": "id"   , "type": "long"   },
        {   "name": "timestamp"    , "type": "long"      },
        {   "name": "terminalid", "type": "int"   },
        {   "name": "creationtime", "type": "long"   },
        {   "name": "tracetype", "type": "int"   },
        {   "name": "traceproperties", "type": {
                "type": "array",
                "items": {
                    "name": "traceproperty",
                    "type": "record",
                    "fields": [
                        {    "name": "id", "type": "long"    },
                        {    "name": "value", "type": "string"    },
                        {    "name": "pkey", "type": "string"    },
                        {    "name": "traceid", "type": "long"    }
                    ]
                }
            }
        }
    ]
}


The script above gives me:

<file avro-test.pig, line 9, column 28> Invalid field reference. Referenced field [terminalid] does not exist in schema: .

So I guess I'm missing the point on how to interface with the schema here?

Thanks in advance!

Kind regards,

Bart

Reply via email to