Hi Andrew,
This was fixed in https://github.com/apache/beam/pull/5360 and will be
available in 2.5.
Even with 2.4, the temporary datasets have a TTL of 24 hours and
self-destruct after that.

On Thu, May 31, 2018 at 2:44 AM Andrew Jones <[email protected]>
wrote:

> Hi,
>
> We've recently enabled two Beam batch jobs in production, running daily,
> and have noticed a whole load of datasets being left behind in BigQuery
> (see attached). These jobs both read and write from BigQuery, and we're
> using Beam 2.4.0. The jobs are running as templates (with
> `withTemplateCompatibility()` when reading).
>
> A similar issue has been reported at
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609.
>
> The code to remove datasets does seem to be there, but I'm not seeing the
> logs in my job, so presumably it's not being called?
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L151
>
> Nothing else obvious in the logs.
>
> Any ideas or suggestions on how to track this issue down?
>
> Thanks,
> Andrew
>

Reply via email to