Following your pointer I worked it out. For the benefit of others:

  @Cleanup
  InputStream fileDependencySchemaIs = this.getClass()
      .getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
  Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);

  GenericDatumWriter<GenericRecord> genericDatumWriter = 
      new GenericDatumWriter<GenericRecord>(fileDependencySchema);
  @Cleanup
  DataFileWriter<GenericRecord> dataFileWriter = 
      new DataFileWriter<GenericRecord>(genericDatumWriter);
  dataFileWriter.create(fileDependencySchema, new File(workFolder,
      FILE_DEPENDENCY_GRAPH_NAME));

  for (Map.Entry<String, Set<String>> entry : fileDependencies
      .entrySet()) {

    GenericRecord genericRecord = new GenericData.Record(
    fileDependencySchema);

    genericRecord.put("file", new Utf8(entry.getKey()));

    Set<String> imports = entry.getValue();
    GenericArray<Utf8> genericArray = new GenericData.Array<Utf8>(
        imports.size(), 
        Schema.createArray(Schema.create(Type.STRING)));
    for (String importFile : imports) {
      genericArray.add(new Utf8(importFile));
    }
    genericRecord.put("imports", genericArray);

    dataFileWriter.append(genericRecord);
  }
  dataFileWriter.flush();

All is now well. I can similarly read in:

  @Cleanup
  InputStream fileDependencySchemaIs = this.getClass()
      .getResourceAsStream(FILE_DEPENDENCY_GRAPH_SCHEMA_NAME);
  Schema fileDependencySchema = Schema.parse(fileDependencySchemaIs);

  GenericDatumReader<GenericRecord> genericDatumReader = 
      new GenericDatumReader<GenericRecord>(fileDependencySchema);

  File file = new File(workFolder, FILE_DEPENDENCY_GRAPH_NAME);

  @Cleanup
  DataFileReader<GenericRecord> dataFileReader = 
      new DataFileReader<GenericRecord>(file, genericDatumReader);

  GenericRecord genericRecord = new GenericData.Record(
      fileDependencySchema);
  while (!dataFileReader.hasNext()) {
    genericRecord = dataFileReader.next(genericRecord);

    String recordFile = ((Utf8) genericRecord.get("file"))
        .toString();

    GenericData.Array<?> recordImportObjects = 
      (GenericData.Array<?>) genericRecord.get("imports");
    Set<String> imports = new HashSet<String>();
    for (Object recordImportObject : recordImportObjects) {
      imports.add(((Utf8) recordImportObject).toString());
    }
    fileDependencies.put(recordFile, imports);
  }

Thanks.

Kind regards,
Christopher

On 12/10/2010, at 5:04 PM, Harsh J wrote:

> You are simply writing encoded data with that code. You need to use 
> o.a.a.file.DataFileWriter to write proper avro datafiles (by appending your 
> datum to it), which stores schema in its headers among other features.

Reply via email to