Why do you need multiple instances of datafilewriter.Are you writing to different files or paths else you just need to instantiate the filewriter once ,keep appending and do a close
On May 18, 2017 4:12 AM, "Svetlana Shnitser" <[email protected]> wrote: Hello, We are attempting to use DataFileWriter to generate avro content, write it to a byte[] and subsequently process. While each chunk of avro data is small, we are generating about 5M of those. Here's the code we are using: DatumWriter<HfReadData> writer = new SpecificDatumWriter<>(HfReadData.getClassSchema()); DataFileWriter<HfReadData> dataFileWriter = new DataFileWriter<HfReadData>(writer); dataFileWriter.setCodec(CodecFactory.deflateCodec(9)); dataFileWriter.create(HfReadData.getClassSchema(), byteStream); dataFileWriter.append(hfReadData); dataFileWriter.close(); byte[] messageBytes = byteStream.toByteArray(); byteStream.close(); // further processing of messageBytes ... Unfortunately, when ran with a 5M data points, we noticed a big spike in heap usage, and the profiler points at numerous instances of DataFileWriter.buffer from the line below: this.buffer = new DataFileWriter.NonCopyingByteArrayOutputStream(Math.min(( int)((double)this.syncInterval * 1.25D), 1073741822)); This output stream doesn't seem to be closed on DataFileWriter.close(). Are we using DataFileWriter in a way that it was not intended to be used? Is there an assumption that there won't be numerous instances of DataFileWriter created, but instead one can be used (with appropriate syncInterval and flush() calls) to generate multiple chunks of avro data? Please advise! Thanks! -- Svetlana Shnitser
