Hi Andrew, This was fixed in https://github.com/apache/beam/pull/5360 and will be available in 2.5. Even with 2.4, the temporary datasets have a TTL of 24 hours and self-destruct after that.
On Thu, May 31, 2018 at 2:44 AM Andrew Jones <[email protected]> wrote: > Hi, > > We've recently enabled two Beam batch jobs in production, running daily, > and have noticed a whole load of datasets being left behind in BigQuery > (see attached). These jobs both read and write from BigQuery, and we're > using Beam 2.4.0. The jobs are running as templates (with > `withTemplateCompatibility()` when reading). > > A similar issue has been reported at > https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609. > > The code to remove datasets does seem to be there, but I'm not seeing the > logs in my job, so presumably it's not being called? > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L151 > > Nothing else obvious in the logs. > > Any ideas or suggestions on how to track this issue down? > > Thanks, > Andrew >
