Two pieces to this: 1. Every element in a PCollection is always associated with a window, and GroupByKey (hence CombinePerKey) operates per-key-and-window (w/ window merging). 2. If an element is not explicitly a KV, then there is no key associated with it.
I'm afraid I don't have any guesses at the problem based on what you've shared. Can you say more? Kenn On Tue, Mar 5, 2019 at 10:29 AM Daniel Debrunner <[email protected]> wrote: > The windowing section of the Beam programming model guide shows a > window defined and used in the GropyByKey transform after a ParDo. > (section 7.1.1). > > However I couldn't find any documentation on how long the window > remains in scope for subsequent transforms. > > I have an application with this pipeline: > > PCollection<KV<A,B>> -> FixedWindow<KV<A,B>> -> GroupByKey -> > PCollection<X> -> FixedWindow<X> -> Combine<X,R>.globally -> > PCollection<R> > > The idea is that the first window is aggregating by key but in the > second window I need to combine elements across all keys. > > With my initial app I was seeing some runtime errors in/after the > combine where a KV<null,R> was being seen, even though at that point > there should be no key for the PCollection<R>. > > In a simpler test I can apply FixedWindow<X> -> Combine<X,R>.globally > -> PCollection<R> to a PCollection without an upstream window and the > combine correctly happens once. > But then adding the keyed upstream window, the combine occurs once per > key without any final combine across the keys. > > So it seems somehow the memory of the key exists even with the new > window transform, > > I'm probably misunderstanding some detail of windowing, but I couldn't > find any deeper documentation than the simple examples in the > programming model guide. > > Can anyone point me in the correct direction? > > Thanks, > Dan. >
