Following your pointer I worked it out. For the benefit of others:
@Cleanup
InputStream fileDependencySchemaIs = this.getClass()
.getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);
GenericDatumWriter<GenericRecord> genericDatumWriter =
new GenericDatumWriter<GenericRecord>(fileDependencySchema);
@Cleanup
DataFileWriter<GenericRecord> dataFileWriter =
new DataFileWriter<GenericRecord>(genericDatumWriter);
dataFileWriter.create(fileDependencySchema, new File(workFolder,
FILE_DEPENDENCY_GRAPH_NAME));
for (Map.Entry<String, Set<String>> entry : fileDependencies
.entrySet()) {
GenericRecord genericRecord = new GenericData.Record(
fileDependencySchema);
genericRecord.put("file", new Utf8(entry.getKey()));
Set<String> imports = entry.getValue();
GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>(
imports.size(),
Schema.createArray(Schema.create(Type.STRING)));
for (String importFile : imports) {
genericArray.add(new Utf8(importFile));
}
genericRecord.put("imports", genericArray);
dataFileWriter.append(genericRecord);
}
dataFileWriter.flush();
All is now well. I can similarly read in:
@Cleanup
InputStream fileDependencySchemaIs = this.getClass()
.getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);
GenericDatumReader<GenericRecord> genericDatumReader =
new GenericDatumReader<GenericRecord>(fileDependencySchema);
File file = new File(workFolder, FILE_DEPENDENCY_GRAPH_NAME);
@Cleanup
DataFileReader<GenericRecord> dataFileReader =
new DataFileReader<GenericRecord>(file, genericDatumReader);
GenericRecord genericRecord = new GenericData.Record(
fileDependencySchema);
while (!dataFileReader.hasNext()) {
genericRecord = dataFileReader.next(genericRecord);
String recordFile = ((Utf8) genericRecord.get("file"))
.toString();
GenericData.Array<?> recordImportObjects =
(GenericData.Array<?>) genericRecord.get("imports");
Set<String> imports = new HashSet<String>();
for (Object recordImportObject : recordImportObjects) {
imports.add(((Utf8) recordImportObject).toString());
}
fileDependencies.put(recordFile, imports);
}
Thanks.
Kind regards,
Christopher
On 12/10/2010, at 5:04 PM, Harsh J wrote:
> You are simply writing encoded data with that code. You need to use
> o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your
> datum to it), which stores schema in its headers among other features.