Two pieces to this:

1. Every element in a PCollection is always associated with a window, and
GroupByKey (hence CombinePerKey) operates per-key-and-window (w/ window
merging).
2. If an element is not explicitly a KV, then there is no key associated
with it.

I'm afraid I don't have any guesses at the problem based on what you've
shared. Can you say more?

Kenn

On Tue, Mar 5, 2019 at 10:29 AM Daniel Debrunner <[email protected]> wrote:

> The windowing section of the Beam programming model guide shows a
> window defined and used in the GropyByKey transform after a ParDo.
> (section 7.1.1).
>
> However I couldn't find any documentation on how long the window
> remains in scope for subsequent transforms.
>
> I have an application with this pipeline:
>
> PCollection<KV<A,B>> -> FixedWindow<KV<A,B>> -> GroupByKey ->
> PCollection<X> -> FixedWindow<X> -> Combine<X,R>.globally ->
> PCollection<R>
>
> The idea is that the first window is aggregating by key but in the
> second window I need to combine elements across all keys.
>
> With my initial app I was seeing some runtime errors in/after the
> combine where a KV<null,R> was being seen, even though at that point
> there should be no key for the PCollection<R>.
>
> In a simpler test I can apply  FixedWindow<X> -> Combine<X,R>.globally
> -> PCollection<R> to a PCollection without an upstream window and the
> combine correctly happens once.
> But then adding the keyed upstream window, the combine occurs once per
> key without any final combine across the keys.
>
> So it seems somehow the memory of the key exists even with the new
> window transform,
>
> I'm probably misunderstanding some detail of windowing, but I couldn't
> find any deeper documentation than the simple examples in the
> programming model guide.
>
> Can anyone point me in the correct direction?
>
> Thanks,
> Dan.
>

Reply via email to