Doubts on Looping inside a beam transform. Processing sequentially using Apache Beam

Feba Fathima Tue, 15 Dec 2020 01:11:02 -0800

Hi,

   We are creating a beam pipeline to do batch processing of data bundles.
The pipeline reads records using CassandraIO. We want to process the data
in batches of 30 min then group/stitch 30 min data and write it to another
table. I have 300 bundles for each employee and we need to process at least
process 50 employees using the limited resources(~2Gi). But currently the
heap usage is very high so that we are only able to process 1 employee(with
~4Gi). if we give more data we are getting Out of memory/Heap errors.


Is there a way to process 1 employee at a time. Like a loop so that we can
process all employees sequentially with our ~2Gi.

We have also posted the same question on Stack Overflow and did not get a
help till now either.

https://stackoverflow.com/questions/65274909/looping-inside-a-beam-transform-process-sequentially-using-apache-beam

Kindly guide us through this if someone is familiar with the scenario.

--
Thanks & Regards,
Feba Fathima

Doubts on Looping inside a beam transform. Processing sequentially using Apache Beam

Reply via email to