Hi Giovanni,

one thing that I overlooked when answering your SF question is that your read_records method ignores the provided offset_range_tracker. That seems that could be the root of the issues - the FileBasedSource is based in splittable DoFn [1], where your logic must cooperate with the offset tracker to be able to split and checkpoint reading of the source file.

Regarding the GroupIntoBatches, I believe that should be the right solution, if your intent is simply to batch the input for optimizing some computation.

Hope this helps, please feel free to reach out if you have any more questions.

Best,

 Jan

[1] https://beam.apache.org/documentation/programming-guide/#splittable-dofns

On 1/17/22 11:12, giovani ..... wrote:
Hello, could someone help me with this problem:
https://stackoverflow.com/questions/70644351/apache-beam-hanging-on-groupbykey-after-windowing-not-triggering
?

Quickly, I am having problems with python direct runner to aggregate a
data-driven window, simply after using a group by the data is not
outputted.

Maybe I am having some problem understanding beam concepts or it is
something reported with the direct runner, any help will be well
appreciated, thank you very much!

Reply via email to