Hi all,

I'm trying to process a few hundred Avro files on GCS. They are getting decoded 
and two simple filters are being applied. When running this on Beam-Direct, all 
heap space is getting filled within a minute or two. I threw 58 GB at it before 
giving up.

To limit the number of files getting processed at once, I have moved the actual 
processing into a pipeline executor. Alas, when running on Beam-Direct, it 
looks like the transforms are only initialised but do not get executed. This 
concerns Write to Log, JavaScript, HTTP Client and BigQuery Output. Everything 
behaves as expected when I configure the pipeline executor to use the Local 
runner.

So, two questions: Is the pipeline executor transform incompatible with Beam? 
And, are there other approaches for limiting memory use in such a case?

cheers

Fabian

Reply via email to