On Tue, Apr 23, 2019 at 11:58 AM Chengxuan Wang <wcxz...@gmail.com> wrote:

> Hi,
>
> I am using Apache Beam python sdk (apache-beam==2.11.0) to run a dataflow
> job with BigQuerySource. Even though I checked the code, BigQueryReader
> will delete the temporary dataset after the query is done.
> https://github.com/apache/beam/blob/1ad61fd384bcd1edd11086a3cf9d7dddb154d934/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L900
>
> But I still see some temporary dataset in my GCP console. Could you help
> me look into it?
>

Did some of your pipelines fail while reading from BigQuery ? If so, it's
possible that the pipeline failed before running the cleanup step.


>
> Another thing is is that possible to set expiration for the temporary
> dataset? right now I see is never.
>

Issue is, this will end up being an upper bound on the total execution time
of the job. It's possible to set this to a very large value (multiple days
or weeks) but not sure if this will help.


>
> Thanks,
> Chengxuan
>

Reply via email to