Hi Giovanni,
one thing that I overlooked when answering your SF question is that your
read_records method ignores the provided offset_range_tracker. That
seems that could be the root of the issues - the FileBasedSource is
based in splittable DoFn [1], where your logic must cooperate with the
offset tracker to be able to split and checkpoint reading of the source
file.
Regarding the GroupIntoBatches, I believe that should be the right
solution, if your intent is simply to batch the input for optimizing
some computation.
Hope this helps, please feel free to reach out if you have any more
questions.
Best,
Jan
[1]
https://beam.apache.org/documentation/programming-guide/#splittable-dofns
On 1/17/22 11:12, giovani ..... wrote:
Hello, could someone help me with this problem:
https://stackoverflow.com/questions/70644351/apache-beam-hanging-on-groupbykey-after-windowing-not-triggering
?
Quickly, I am having problems with python direct runner to aggregate a
data-driven window, simply after using a group by the data is not
outputted.
Maybe I am having some problem understanding beam concepts or it is
something reported with the direct runner, any help will be well
appreciated, thank you very much!