Thanks for the ideas Luke. I checked out the json graphs as per your
recommendation (thanks for that, was previously unaware), and the
"output_info" was identical for both the running pipeline and the pipeline
I was hoping to update it with.  I ended up opting to just drain and submit
the updated pipeline as a new job.  Thanks for the tips!

Thanks,
Evan

On Thu, Oct 21, 2021 at 7:02 PM Luke Cwik <[email protected]> wrote:

> I would suggest dumping the JSON representation (with the
> --dataflowJobFile=/path/to/output.json) of the pipeline before and after
> and looking to see what is being submitted to Dataflow. Dataflow's JSON
> graph representation is a bipartite graph where there are transform nodes
> with inputs and outputs and PCollection nodes with no inputs or outputs.
> The PCollection nodes typically end with the suffix ".out". This could help
> find steps that have been added/removed/renamed.
>
> The PipelineDotRenderer[1] might be of use as well.
>
> 1:
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/renderer/PipelineDotRenderer.java
>
> On Thu, Oct 21, 2021 at 11:54 AM Evan Galpin <[email protected]>
> wrote:
>
>> Hi all,
>>
>> I'm looking for any help regarding updating streaming jobs which are
>> already running on Dataflow.  Specifically I'm seeking guidance for
>> situations where Fusion is involved, and trying to decipher which old steps
>> should be mapped to which new steps.
>>
>> I have a case where I updated the steps which come after the step in
>> question, but when I attempt to update there is an error that "<old step>
>> no longer produces data to the steps <downstream step>". I believe that
>> <old step> is only changed as a result of fusion, and in reality it does in
>> fact produce data to <downstream step> (confirmed when deployed as a new
>> job for testing purposes).
>>
>> Is there a guide for how to deal with updates and fusion?
>>
>> Thanks,
>> Evan
>>
>

Reply via email to