Re: File processing triggered from external source

Alexey Romanenko Tue, 25 May 2021 09:18:29 -0700

You don’t need to use windowing strategy or aggregation triggers for a pipeline 
with bounded source to perform GbK-like transforms, but since you started to 
use unbounded source then your pcollections became unbounded and you need to do 
that. Otherwise, it’s unknown at which point of time your GbK transforms will 
have all data arrived to process it (in theory, it will never happened because 
of “unbounded” definition).


What is an issue with applying windowing/triggering strategy for your case?

—
Alexey

> On 24 May 2021, at 10:25, Sozonoff Serge <se...@sozonoff.com> wrote:
> 
> Hi,
> 
> Referring to the explanation found at the following link under (Stream 
> processing triggered from an external source)
> 
> https://beam.apache.org/documentation/patterns/file-processing/ 
> <https://beam.apache.org/documentation/patterns/file-processing/>
> 
> 
> While implementing this solution I am trying to figure out how to deal with 
> the fact that my pipeline, which was bound, has now become unbound. It 
> exposes me to windowing/triggering concerns which I did not have de deal with 
> before and in essence are unnecessary since I am still fundamentally dealing 
> with bound data. The only reason I have an unbound source involved is as a 
> trigger and provider of the file to be processed.
> 
> Since my pipeline uses GroupByKey transforms I get the following error.
> 
> Exception in thread "main" java.lang.IllegalStateException: GroupByKey cannot 
> be applied to non-bounded PCollection in the GlobalWindow without a trigger. 
> Use a Window.into or Window.triggering transform prior to GroupByKey.
> 
> Do I really need to add windowing/triggering semantics to the PCollections 
> which are built from bound data ?
> 
> Thanks for any pointers.
> 
> Serge

Re: File processing triggered from external source

Reply via email to