You are simply writing encoded data with that code. You need to use o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your datum to it), which stores schema in its headers among other features.
On Oct 12, 2010 11:29 AM, "Christopher Hunt" <[email protected]> wrote: Hi there, I've just noticed that when I write out my binary data I don't appear to have a schema saved with it. I was under the impression that Avro saves schemas along with the data. Thanks for any clarification. Here's my schema: { "name": "FileDependency", "type": "record", "fields": [ {"name": "file", "type": "string"}, {"name": "imports", "type": { "type": "array", "items": "string"} } ] } The code to write out my data is as follows (also appreciate any refinement suggestions as I'm new to Avro): @Cleanup InputStream fileDependencySchemaIs = this.getClass() .getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME); Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs); GenericDatumWriter<GenericRecord> genericDatumWriter = new GenericDatumWriter<GenericRecord>(fileDependencySchema); @Cleanup OutputStream os = new FileOutputStream(new File(workFolder, FILE_DEPENDENCY_GRAPH_NAME)); Encoder encoder = new BinaryEncoder(os); for (Map.Entry<String, Set<String>> entry : fileDependencies .entrySet()) { GenericRecord genericRecord = new GenericData.Record( fileDependencySchema); genericRecord.put("file", new Utf8(entry.getKey())); Set<String> imports = entry.getValue(); GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>( imports.size(), Schema.createArray(Schema.create(Type.STRING))); for (String importFile : imports) { genericArray.add(new Utf8(importFile)); } genericRecord.put("imports", genericArray); genericDatumWriter.write(genericRecord, encoder); } encoder.flush(); Thanks again. Kind regards, Christopher
