On Wed, Apr 24, 2019 at 5:38 PM Chengxuan Wang <wcxz...@gmail.com> wrote:
> On Tue, Apr 23, 2019 at 11:58 AM Chengxuan Wang <wcxz...@gmail.com> wrote: > >> Hi, >> >> I am using Apache Beam python sdk (apache-beam==2.11.0) to run a dataflow >> job with BigQuerySource. Even though I checked the code, BigQueryReader >> will delete the temporary dataset after the query is done. >> https://github.com/apache/beam/blob/1ad61fd384bcd1edd11086a3cf9d7dddb154d934/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L900 >> >> But I still see some temporary dataset in my GCP console. Could you help >> me look into it? >> > > Did some of your pipelines fail while reading from BigQuery ? If so, > it's possible that the pipeline failed before running the cleanup step. > > The cleanup is in __exit__ and we run with `with` statement, this should > clean up the dataset even though the pipeline is failed, right? > Which runner are you using ? Pls. note that what you are looking at is the implementation for the DirectRunner. Implementation for DataflowRunner is in Dataflow service (given that BQ is a native source). > > > Chamikara Jayalath <chamik...@google.com> 于2019年4月24日周三 下午5:24写道: > >> >> >> On Tue, Apr 23, 2019 at 11:58 AM Chengxuan Wang <wcxz...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am using Apache Beam python sdk (apache-beam==2.11.0) to run a >>> dataflow job with BigQuerySource. Even though I checked the code, >>> BigQueryReader will delete the temporary dataset after the query is done. >>> https://github.com/apache/beam/blob/1ad61fd384bcd1edd11086a3cf9d7dddb154d934/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L900 >>> >>> But I still see some temporary dataset in my GCP console. Could you help >>> me look into it? >>> >> >> Did some of your pipelines fail while reading from BigQuery ? If so, it's >> possible that the pipeline failed before running the cleanup step. >> >> >>> >>> Another thing is is that possible to set expiration for the temporary >>> dataset? right now I see is never. >>> >> >> Issue is, this will end up being an upper bound on the total execution >> time of the job. It's possible to set this to a very large value (multiple >> days or weeks) but not sure if this will help. >> >> >>> >>> Thanks, >>> Chengxuan >>> >>