Can you try running direct runner with the option
`--experiments=use_deprecated_read`

Seems like an instance of
https://issues.apache.org/jira/browse/BEAM-10670?focusedCommentId=17316858&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17316858
also reported in
https://lists.apache.org/thread.html/re6b0941a8b4951293a0327ce9b25e607cafd6e45b69783f65290edee%40%3Cdev.beam.apache.org%3E

We should rollback using the SDF wrapper by default because of the
usability and performance issues reported.


On Sat, May 8, 2021 at 12:57 AM Evan Galpin <evan.gal...@gmail.com> wrote:

> Hi all,
>
> I’m experiencing very slow performance and startup delay when testing a
> pipeline locally. I’m reading data from a Google PubSub subscription as the
> data source, and before each pipeline execution I ensure that data is
> present in the subscription (readable from GCP console).
>
> I’m seeing startup delay on the order of minutes with DirectRunner (5-10
> min). Is that expected? I did find a Jira ticket[1] that at first seemed
> related, but I think it has more to do with BQ than DirectRunner.
>
> I’ve run the pipeline with a debugger connected and confirmed that it’s
> minutes before the first DoFn in my pipeline receives any data. Is there a
> way I can profile the direct runner to see what it’s churning on?
>
> Thanks,
> Evan
>
> [1]
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/BEAM-4548
>

Reply via email to