Thanks for the ideas Luke. I checked out the json graphs as per your recommendation (thanks for that, was previously unaware), and the "output_info" was identical for both the running pipeline and the pipeline I was hoping to update it with. I ended up opting to just drain and submit the updated pipeline as a new job. Thanks for the tips!
Thanks, Evan On Thu, Oct 21, 2021 at 7:02 PM Luke Cwik <[email protected]> wrote: > I would suggest dumping the JSON representation (with the > --dataflowJobFile=/path/to/output.json) of the pipeline before and after > and looking to see what is being submitted to Dataflow. Dataflow's JSON > graph representation is a bipartite graph where there are transform nodes > with inputs and outputs and PCollection nodes with no inputs or outputs. > The PCollection nodes typically end with the suffix ".out". This could help > find steps that have been added/removed/renamed. > > The PipelineDotRenderer[1] might be of use as well. > > 1: > https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PipelineDotRenderer.java > > On Thu, Oct 21, 2021 at 11:54 AM Evan Galpin <[email protected]> > wrote: > >> Hi all, >> >> I'm looking for any help regarding updating streaming jobs which are >> already running on Dataflow. Specifically I'm seeking guidance for >> situations where Fusion is involved, and trying to decipher which old steps >> should be mapped to which new steps. >> >> I have a case where I updated the steps which come after the step in >> question, but when I attempt to update there is an error that "<old step> >> no longer produces data to the steps <downstream step>". I believe that >> <old step> is only changed as a result of fusion, and in reality it does in >> fact produce data to <downstream step> (confirmed when deployed as a new >> job for testing purposes). >> >> Is there a guide for how to deal with updates and fusion? >> >> Thanks, >> Evan >> >
