Unsubscribe On Mon, Oct 12, 2020 at 11:40 PM Doug Cutting <[email protected]> wrote:
> If you only intend to examine a subset of the fields, you can pass in a > version of your schema with all but those fields removed as the 'reader' > schema. Fields not in this minimized schema will be skipped without > creating any structures. > > Alternatively, you can walk a schema calling the decoder API and process > data without constructing a complete representation of it. > > As an example, see GenericDatumReader#skip(). > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L564 > > You can write a method with a similar structure, a big switch statement > over Avro types with recursive calls, except yours might selectively > process some fields. This permits SAX-like, event-based processing, if you > remember that from XML parsing. > > Doug > > > On Fri, Oct 9, 2020 at 3:51 PM Richard Ney <[email protected]> > wrote: > >> I have the need to read in Avro messages from files that inflate to sizes >> that are causing OOM errors due to the in memory representation of the >> inflated document exceeding 1.5GB of Heap. Is there a way to stream the >> file into the application, inflate it, and marshal the contents without >> pulling the entire message into memory or am I restricted to chunking only >> at the message level? >> >
