[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132277#comment-17132277 ] Beam JIRA Bot commented on BEAM-3772: - This issue is assigned but has not received an update in 30 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: P2 > Labels: stale-assigned > Attachments: bigquery-fail.png, bigquery-success.png, > image-2019-09-17-12-01-42-764.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005257#comment-17005257 ] Piotr Wikieł commented on BEAM-3772: Exactly the same wproblem on 2.16.0. And I don't think it is a problem with time partitioning. I tested it on both partitioned and non partitioned tables and the results were the same. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png, > image-2019-09-17-12-01-42-764.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932718#comment-16932718 ] Reuven Lax commented on BEAM-3772: -- It's possible that there's a bug with time partitioning. If time-partitioned tables interact with create disposition differently, some assumptions might be breaking in the code. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png, > image-2019-09-17-12-01-42-764.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932700#comment-16932700 ] Jose Puertos commented on BEAM-3772: Yes, using withTimePartitioning.. I have changed my code as to not use shards and just write to One Big Query Table with multiple partitions > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png, > image-2019-09-17-12-01-42-764.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932698#comment-16932698 ] Reuven Lax commented on BEAM-3772: -- Are you calling withTimePartitioning on BigQueryIO? Also, are you using clustering? > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png, > image-2019-09-17-12-01-42-764.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931600#comment-16931600 ] Jose Puertos commented on BEAM-3772: Here having the same issue with 2.12.0 and 2.15.0 . When looking into the Big Query Jobs it seems as the code for the next jobs trying to upload partitions after the first day use CREATE_NEVER even though the code has WRITE_APPEND withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND). withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) !image-2019-09-17-12-01-42-764.png|width=487,height=116! > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png, > image-2019-09-17-12-01-42-764.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931573#comment-16931573 ] Luke Cwik commented on BEAM-3772: - Another user reported this as well: https://stackoverflow.com/questions/57969706/bigqueryio-only-first-day-table-can-be-created-despite-having-createdispositi > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931152#comment-16931152 ] Dawid Bodych commented on BEAM-3772: I have a similar problem on SDK 2.15.0. First table (per worker it seems) gets created but next ones don't. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Reuven Lax >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929839#comment-16929839 ] Reuven Lax commented on BEAM-3772: -- As mentioned on the PR, triggers will be per key after a reshuffle, so it should be impossible for a table to skip a trigger. We have fixed some bugs in this code though that could have potentially caused this error, and this Jira mentions Beam 2.9. Can you try with a more-recent Beam version? Also what is the code for your DynamicDestinations - is it deterministic? > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755958#comment-16755958 ] Marco Veluscek commented on BEAM-3772: -- I have upgraded my code to Apache beam 2.9 and Scio 0.7.0 and tried again, but the same issue still remains. The code I posted above did not change much with the upgrade. In the Dataflow Job details on Google Cloud Console I can see the following details: |*userAgent*|Apache_Beam_SDK_for_Java/2.9.0| |*scioVersion*|0.7.0| |*scalaVersion*|2.12.8| Looking at the job history of BigQuery, I can see a successful load job to create the first table. As matter of fact, the first table get created and data inserted. The second table does not get created. The job history shows several failed load jobs. The first successful job looks like as follows: !bigquery-success.png! The failed job looks like as follows: !bigquery-fail.png! I should be clearer: I have tried with both the DirectRunner and the DataflowRunner. The problem only occurs with the DataflowRunner. Is this the right place to ask for help with the DataflowRunner? If not, could you point me to the right place? Thank you. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Major > Attachments: bigquery-fail.png, bigquery-success.png > > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742214#comment-16742214 ] Marco Veluscek commented on BEAM-3772: -- Ok, thank you, I'll try again with 2.9 to confirm this. Thank you for your help. > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Major > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740698#comment-16740698 ] Reuven Lax commented on BEAM-3772: -- There was a similar bug in the past, but I believe it was fixed. Surprising if this is happening in 2.9. The text of the error message you posted does not seem to match the current code in Beam 2.9. Can you verify that you are actually using Beam 2.9? On Fri, Jan 11, 2019 at 1:22 AM Marco Veluscek (JIRA) > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Eugene Kirpichov >Priority: Major > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739264#comment-16739264 ] Marco Veluscek commented on BEAM-3772: -- I am experiencing the same issue with Beam 2.6.0 and 2.9.0. I am also using Spotify Scio 0.6.1. Do you have any news about this issue? I have written a sample code with Scio to test this problem. Is this code correct? If I run the following code with the DirectRunner, it works, I am able to create all tables I need in BigQuery. {code:java} import com.google.api.services.bigquery.model.{TableFieldSchema, TableReference, TableRow, TableSchema} import com.spotify.scio.ContextAndArgs import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition import org.apache.beam.sdk.io.gcp.bigquery.{BigQueryIO, DynamicDestinations, TableDestination} import org.apache.beam.sdk.values.ValueInSingleWindow import org.joda.time.Duration import scala.collection.JavaConverters._ object PubSubToBigQuery { case class Row(id: Long, data: String, data2: String, destination: String) /** * Run the example with * run \ * --project= \ * --runner=DataflowRunner \ * --stagingLocation=gs:///staging \ * --tempLocation=gs:///temp \ * --region=europe-west1 * * Message Example 1: 1,string_data1,string_data2,table_name1 * Message Example 2: 2,string_data1,string_data2,table_name2 --> the creation of this table fails */ def main(cargs: Array[String]): Unit = { val (sc, args) = ContextAndArgs(cargs) sc .pubsubSubscription[String]("projects//subscriptions/") .map(s => { val fields = s.split(",") Row( id = fields(0).toLong, data = fields(1), data2 = fields(2), destination = fields(3) ) }) .saveAsCustomOutput( "Custom bigquery IO", BigQueryIO.write[Row]() .to(new DynamicDestinations[Row, String]() { override def getDestination(element: ValueInSingleWindow[Row]): String = element.getValue.destination override def getTable(destination: String): TableDestination = new TableDestination( new TableReference() .setProjectId("") .setDatasetId("") .setTableId(destination), "Table for destination " + destination) override def getSchema(destination: String): TableSchema = { val nestedFields = List(new TableFieldSchema().setName("nested_data").setType("STRING")).asJava new TableSchema().setFields( List( new TableFieldSchema().setName("id").setType("INTEGER"), new TableFieldSchema() .setName("data") .setType("RECORD") .setMode("REPEATED") .setFields(nestedFields) ).asJava) } }) .withFormatFunction( elem => { val nestedTableRow = List( new TableRow().set("nested_data", elem.data), new TableRow().set("nested_data", elem.data2) ).asJava new TableRow() .set("id", elem.id) .set("data", nestedTableRow) } ) .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) .withWriteDisposition(WriteDisposition.WRITE_APPEND) .withMethod(BigQueryIO.Write.Method.FILE_LOADS) .withNumFileShards(1) .withTriggeringFrequency(Duration.standardSeconds(1)) ) sc.close() } } {code} > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Eugene Kirpichov >Priority: Major > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables