Unsubscribe

On Mon, Oct 12, 2020 at 11:40 PM Doug Cutting <[email protected]> wrote:

> If you only intend to examine a subset of the fields, you can pass in a
> version of your schema with all but those fields removed as the 'reader'
> schema.  Fields not in this minimized schema will be skipped without
> creating any structures.
>
> Alternatively, you can walk a schema calling the decoder API and process
> data without constructing a complete representation of it.
>
> As an example, see GenericDatumReader#skip().
>
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L564
>
> You can write a method with a similar structure, a big switch statement
> over Avro types with recursive calls, except yours might selectively
> process some fields.  This permits SAX-like, event-based processing, if you
> remember that from XML parsing.
>
> Doug
>
>
> On Fri, Oct 9, 2020 at 3:51 PM Richard Ney <[email protected]>
> wrote:
>
>> I have the need to read in Avro messages from files that inflate to sizes
>> that are causing OOM errors due to the in memory representation of the
>> inflated document exceeding 1.5GB of Heap. Is there a way to stream the
>> file into the application, inflate it, and marshal the contents without
>> pulling the entire message into memory or am I restricted to chunking only
>> at the message level?
>>
>

Reply via email to