Yes, this is a known issue. Here's a prior discussion: https://lists.apache.org/thread.html/e9518f5d5f4bcf7bab02de2cb9fe1bd5293d87aa12d46de1eac4600b@%3Cuser.beam.apache.org%3E
It is actually long-standing and the solution is known but hard. On Wed, May 30, 2018 at 9:48 AM Carlos Alonso <[email protected]> wrote: > Hi everyone!! > > Working with multimap based side inputs on the global window I'm > experiencing something unexpected (at least to me) that I'd like to share > with you to clarify. > > The way I understand multimaps is that when one emits two values for the > same key for the same window (obvious thing here as I'm working on the > Global one), the newly emitted values are appended to the Iterable > collection that is the value for that particular key on the map. > > Testing it in this job (it is using scio, but side inputs are implemented > with PCollectionViews): > https://github.com/calonso/beam_experiments/blob/master/refreshingsideinput/src/main/scala/com/mrcalonso/RefreshingSideInput2.scala > > The steps to reproduce are: > 1. Create one table on the target BQ > 2. Run the job > 3. Patch the table on BQ (add one field), this should generate a new > TableSchema for the corresponding TableReference > 4. An updated value of the fields number appear on the logs, but there is > only one element within the iterable, as if it had been updated instead of > appended!! > > Is that the expected behaviour? Is a bug? Am I missing something? > > Thanks! >
