oom converting larger data set

Mark Lybarger Wed, 31 Jul 2024 14:39:24 -0700

here's my current flow.  i have a java program that uses avro schema file
to generate pojos.  the code reads data from a postgres table and transfers
the data  from the db to a list of the generated pojos.  i have 4.5m
records in the db that the process is reading.  once the avro pojos are
populated, it then uses the avro writer to output parquet format that is
ingested into our data lake.


the problem is that as the table keeps growing, we get oom. i'll be looking
at where in the code the oom is coming.  continually increasing the memory
isn't a feasible solution. what are some common patterns for handling
this?  i'm thinking to chunk the records; is it possible to process 500k
records at a time, then concatenate the parquet files? i'm pretty new to
this.

oom converting larger data set

Reply via email to