Hi Fabian, I just tried to reproduce. I can run a pipeline with a disabled hop to Beam BigQuery output. If I don't disable or delete that Hop, I get a situation that is very similar to what you describe.
The line below indicates that Hop tries to initialize the BigQuery Output transform: 2022/07/19 14:13:42 - General - Handled transform (BQ OUTPUT) : Beam BigQuery Output, gets data from Select values Can you confirm you have this issue, even when the hop before your Beam BigQuery Output transform is disabled? Regards, Bart On Tue, Jul 19, 2022 at 2:18 PM Fabian Peters <[email protected]> wrote: > Hi Bart, > > I didn't try this before, because what I'm interested in is seeing the > intermediate steps' output. > > However, with the Beam-Direct runner, the pipeline just hangs: > > 2022/07/19 14:13:34 - Hop - Pipeline opened. > 2022/07/19 14:13:34 - Hop - Launching pipeline [sites]... > 2022/07/19 14:13:34 - Hop - Started the pipeline execution. > 2022/07/19 14:13:41 - General - Created Apache Beam pipeline with name > 'sites' > 2022/07/19 14:13:41 - General - Handled generic transform (TRANSFORM) : > Get file names, gets data from 0 previous transform(s), targets=0, infos=0 > 2022/07/19 14:13:41 - General - Handled generic transform (TRANSFORM) : > Last modified, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:13:41 - General - Handled generic transform (TRANSFORM) : > Avro File Input, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:13:41 - General - Handled generic transform (TRANSFORM) : > Avro to site, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:13:41 - General - Handled generic transform (TRANSFORM) : > Filter rows, gets data from 1 previous transform(s), targets=1, infos=0 > 2022/07/19 14:13:41 - General - Transform Select values reading from > previous transform targeting this one using : Filter rows - TARGET - Select > values > 2022/07/19 14:13:41 - General - Handled generic transform (TRANSFORM) : > Select values, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:13:42 - General - Handled transform (BQ OUTPUT) : Beam > BigQuery Output, gets data from Select values > 2022/07/19 14:13:42 - sites - Executing this pipeline using the Beam > Pipeline Engine with run configuration 'Beam-Direct' > … nothing more happens > > If I remove the "Beam BigQuery Output" transform: > > 2022/07/19 14:14:32 - Hop - Pipeline opened. > 2022/07/19 14:14:32 - Hop - Launching pipeline [sites]... > 2022/07/19 14:14:32 - Hop - Started the pipeline execution. > 2022/07/19 14:14:40 - General - Created Apache Beam pipeline with name > 'sites' > 2022/07/19 14:14:40 - General - Handled generic transform (TRANSFORM) : > Get file names, gets data from 0 previous transform(s), targets=0, infos=0 > 2022/07/19 14:14:40 - General - Handled generic transform (TRANSFORM) : > Last modified, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:14:40 - General - Handled generic transform (TRANSFORM) : > Avro File Input, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:14:40 - General - Handled generic transform (TRANSFORM) : > Avro to site, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:14:40 - General - Handled generic transform (TRANSFORM) : > Filter rows, gets data from 1 previous transform(s), targets=1, infos=0 > 2022/07/19 14:14:40 - General - Transform Select values reading from > previous transform targeting this one using : Filter rows - TARGET - Select > values > 2022/07/19 14:14:40 - General - Handled generic transform (TRANSFORM) : > Select values, gets data from 1 previous transform(s), targets=0, infos=0 > 2022/07/19 14:14:40 - sites - Executing this pipeline using the Beam > Pipeline Engine with run configuration 'Beam-Direct' > 2022/07/19 14:14:44 - sites - Beam pipeline execution has finished. > > The pipeline ran successfully via the DataFlow runner. > > Fabian > > > Am 19.07.2022 um 13:11 schrieb Bart Maertens <[email protected]>: > > Hi Fabian, > > Do you have this issue with the BeaM Direct run configuration as well? > The Beam Bigquery Output transform is Beam only, so this won't work with > the native (local or remote) run configuration. > > If the issue exists with the direct runner, can you share any errors > you get? > > Regards, > Bart > > On Tue, Jul 19, 2022 at 12:12 PM Fabian Peters <[email protected]> wrote: > >> Hi all! >> >> I'm developing a number of pipelines that write data to BigQuery. This >> works fine, alas, during development I find I have to entirely remove the >> "Beam BigQuery Output" transform to be able to run the pipeline locally, >> disabling or deleting the hop to it did not help. Is there a way to keep >> the transform around while debugging the pipeline? >> >> cheers >> >> Fabian > > >
