Hi,
I'm trying to read the Avro file i stored on HDFS, but I seem to be
hitting a snag. I'm hoping some of you will be able to shed some light
on this and allow me to continue my adventure!
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';
DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
avro = load '/import/2012-01-04-deflate.avro' USING AvroStorage();
groups = group avro by trace.terminalid;
sc = foreach groups generate group as terminalid, COUNT(avro) as cnt;
store sc into '/import/test-out.avro' USING AvroStorage();
The schema of the avro file:
{
"type": "record",
"name": "trace",
"namespace": "asp",
"fields": [
{ "name": "id" , "type": "long" },
{ "name": "timestamp" , "type": "long" },
{ "name": "terminalid", "type": "int" },
{ "name": "creationtime", "type": "long" },
{ "name": "tracetype", "type": "int" },
{ "name": "traceproperties", "type": {
"type": "array",
"items": {
"name": "traceproperty",
"type": "record",
"fields": [
{ "name": "id", "type": "long" },
{ "name": "value", "type": "string" },
{ "name": "pkey", "type": "string" },
{ "name": "traceid", "type": "long" }
]
}
}
}
]
}
The script above gives me:
<file avro-test.pig, line 9, column 28> Invalid field reference.
Referenced field [terminalid] does not exist in schema: .
So I guess I'm missing the point on how to interface with the schema
here?
Thanks in advance!
Kind regards,
Bart