On Sun, Nov 20, 2016 at 3:11 PM, Terry Casstevens <[email protected]> wrote:
> I tried a combination of the two technics above which wrote the Schema > with DataFileWriter and then used BlockingBinaryEncoder to write the > Datums. But upon reading the file I get "Invalid Sync!" > DataFileWriter maintains a buffer inside, and syncs it to file periodically. This might be why you got "invalid sync" error. Have you tried to call DataFileWriter.sync() before switch to write with BlockingBinaryEncoder.? Seems like what I need is a way to pass DataFileWriter a > BlockingBinaryEncoder for it to use. Because it is automatically > using a BinaryEncoder. And the API has no way to pass it a different > I agree that current API doesn't allow this. Create a JIRA to track this requirement? *Yibing Shi* *Customer Operations Engineer* <http://www.cloudera.com> *Yibing Shi* *Customer Operations Engineer* <http://www.cloudera.com> On Sun, Nov 20, 2016 at 3:11 PM, Terry Casstevens <[email protected]> wrote: > Dear Avro Community, > > I'm having problems writing large Datums to an Avro file. Can someone > please advise? > > Normally what is done.. > - Create DatumWriter > - Create DataFileWriter(DatumWriter) > - Open file with DataFileWriter.create(Schema, File) > - When the file was open, it wrote the Schema to the file. > - Then you can DataFileWriter.append(Datum) many times. > > Problem is DataFileWriter.append() doesn't handle very large Datum. > > And apparently the solution is to use a BlockingBinaryEncoder, which > does solve the OutOfMemoryError. > - Create DatumWriter > - Create OutputStream -> File > - Create EncoderFactory.blockingBinaryEncoder(OutputStream) > - DatumWriter.write(Datum, BlockingBinaryEncoder) > > But that BlockingBinaryEncoder solution doesn't write the Schema to > the beginning of the file. > - Making it not work with DataFileReader. > - Plus these Schemas are different, so needs to be there > > I tried a combination of the two technics above which wrote the Schema > with DataFileWriter and then used BlockingBinaryEncoder to write the > Datums. But upon reading the file I get "Invalid Sync!" > > Seems like what I need is a way to pass DataFileWriter a > BlockingBinaryEncoder for it to use. Because it is automatically > using a BinaryEncoder. And the API has no way to pass it a different > one. > > > Thank you, > > Terry >
