Hi, A bit more info: I am trying to find out how the following construct is translated into Spark: I have the following pipeline: A->B->C->(D/E/F with multipleTags)
Then D, E, F branch out and do things independently. B and C in this case are very heavy steps and it seems to my by looking at the jobs generated in spark and corresponding DAGs for Stages that this is being transformed in spark to 3 independent pipelines: A->B->C->D A->B->C->E A->B->C->F And the operation B, C which are extremely heavy seem to be repeated. Could this be the case? Am I missing something? I am not expert at looking at these generated DAGs but it is the general feeling I get hence why I wanted to know if there is a way to see what is generated from Beam to Spark to run. Best regards, Augusto On 2019/05/06 06:41:07, [email protected] <[email protected]> wrote: > Hi, > > I would like to know if there is a way to inspect whatever pipeline was > generated from Beam to be run in the Spark Runner. > > Best regards, > Augusto >
