On Wed, Apr 24, 2019 at 5:38 PM Chengxuan Wang <wcxz...@gmail.com> wrote:

> On Tue, Apr 23, 2019 at 11:58 AM Chengxuan Wang <wcxz...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using Apache Beam python sdk (apache-beam==2.11.0) to run a dataflow
>> job with BigQuerySource. Even though I checked the code, BigQueryReader
>> will delete the temporary dataset after the query is done.
>> https://github.com/apache/beam/blob/1ad61fd384bcd1edd11086a3cf9d7dddb154d934/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L900
>>
>> But I still see some temporary dataset in my GCP console. Could you help
>> me look into it?
>>
>
>     Did some of your pipelines fail while reading from BigQuery ? If so,
> it's possible that the pipeline failed before running the cleanup step.
>
> The cleanup is in __exit__ and we run with `with` statement, this should
> clean up the dataset even though the pipeline is failed, right?
>

Which runner are you using ? Pls. note that what you are looking at is the
implementation for the DirectRunner. Implementation for DataflowRunner is
in Dataflow service (given that BQ is a native source).


>
>
> Chamikara Jayalath <chamik...@google.com> 于2019年4月24日周三 下午5:24写道:
>
>>
>>
>> On Tue, Apr 23, 2019 at 11:58 AM Chengxuan Wang <wcxz...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am using Apache Beam python sdk (apache-beam==2.11.0) to run a
>>> dataflow job with BigQuerySource. Even though I checked the code,
>>> BigQueryReader will delete the temporary dataset after the query is done.
>>> https://github.com/apache/beam/blob/1ad61fd384bcd1edd11086a3cf9d7dddb154d934/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L900
>>>
>>> But I still see some temporary dataset in my GCP console. Could you help
>>> me look into it?
>>>
>>
>> Did some of your pipelines fail while reading from BigQuery ? If so, it's
>> possible that the pipeline failed before running the cleanup step.
>>
>>
>>>
>>> Another thing is is that possible to set expiration for the temporary
>>> dataset? right now I see is never.
>>>
>>
>> Issue is, this will end up being an upper bound on the total execution
>> time of the job. It's possible to set this to a very large value (multiple
>> days or weeks) but not sure if this will help.
>>
>>
>>>
>>> Thanks,
>>> Chengxuan
>>>
>>

Reply via email to