[jira] [Commented] (BEAM-6514) Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104994#comment-17104994 ] Pablo Estrada commented on BEAM-6514: - This seems to havbe been noticed by others here: [https://stackoverflow.com/questions/61658242/dataprep-is-leaving-datasets-tables-behind-in-bigquery] > Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery > > > Key: BEAM-6514 > URL: https://issues.apache.org/jira/browse/BEAM-6514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Rumeshkrishnan Mohan >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Dataflow is leaving Datasets/Tables behind in BigQuery when the pipeline is > cancelled or when it fails. I cancelled a job or it failed at run time, and > it left behind a dataset and table in BigQuery. > # `cleanupTempResource` method involves cleaning tables and dataset after > batch job succeed. > # If job failed in the middle or cancelled explicitly, the temporary dataset > and tables remain exist. I do see the table expire period 1 day as per code > in `getTableToExtract` function written in BigQueryQuerySource.java. > # I can understand that, keep temp tables and dataset when failure for > debugging. > # Can we have pipeline or job optional parameters which get clean temporary > dataset and tables when cancel or fail ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6514) Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891772#comment-16891772 ] Santhsoh commented on BEAM-6514: am facing same issue with Google Cloud Dataflow SDK for Java 2.5.0, is there any workaround to remove temp data sets in dataflow. > Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery > > > Key: BEAM-6514 > URL: https://issues.apache.org/jira/browse/BEAM-6514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Rumeshkrishnan >Assignee: Chamikara Jayalath >Priority: Major > > Dataflow is leaving Datasets/Tables behind in BigQuery when the pipeline is > cancelled or when it fails. I cancelled a job or it failed at run time, and > it left behind a dataset and table in BigQuery. > # `cleanupTempResource` method involves cleaning tables and dataset after > batch job succeed. > # If job failed in the middle or cancelled explicitly, the temporary dataset > and tables remain exist. I do see the table expire period 1 day as per code > in `getTableToExtract` function written in BigQueryQuerySource.java. > # I can understand that, keep temp tables and dataset when failure for > debugging. > # Can we have pipeline or job optional parameters which get clean temporary > dataset and tables when cancel or fail ? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-6514) Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752966#comment-16752966 ] Rumeshkrishnan commented on BEAM-6514: -- For more details : https://github.com/GoogleCloudPlatform/DataflowJavaSDK/issues/609 > Dataflow Batch Job Failure is leaving Datasets/Tables behind in BigQuery > > > Key: BEAM-6514 > URL: https://issues.apache.org/jira/browse/BEAM-6514 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Reporter: Rumeshkrishnan >Assignee: Chamikara Jayalath >Priority: Major > > Dataflow is leaving Datasets/Tables behind in BigQuery when the pipeline is > cancelled or when it fails. I cancelled a job or it failed at run time, and > it left behind a dataset and table in BigQuery. > # `cleanupTempResource` method involves cleaning tables and dataset after > batch job succeed. > # If job failed in the middle or cancelled explicitly, the temporary dataset > and tables remain exist. I do see the table expire period 1 day as per code > in `getTableToExtract` function written in BigQueryQuerySource.java. > # I can understand that, keep temp tables and dataset when failure for > debugging. > # Can we have pipeline or job optional parameters which get clean temporary > dataset and tables when cancel or fail ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)