subject:"\[jira\] \[Commented\] \(BEAM\-3772\) BigQueryIO \- Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2020-06-10 Thread Beam JIRA Bot (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132277#comment-17132277
 ] 

Beam JIRA Bot commented on BEAM-3772:
-

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: P2
>  Labels: stale-assigned
> Attachments: bigquery-fail.png, bigquery-success.png, 
> image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-12-30 Thread Jira



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005257#comment-17005257
 ] 

Piotr Wikieł commented on BEAM-3772:


Exactly the same wproblem on 2.16.0.

And I don't think it is a problem with time partitioning. I tested it on both 
partitioned and non partitioned tables and the results were the same.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png, 
> image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-18 Thread Reuven Lax (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932718#comment-16932718
 ] 

Reuven Lax commented on BEAM-3772:
--

It's possible that there's a bug with time partitioning. If time-partitioned 
tables interact with create disposition differently, some assumptions might be 
breaking in the code. 

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png, 
> image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-18 Thread Jose Puertos (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932700#comment-16932700
 ] 

Jose Puertos commented on BEAM-3772:


Yes, using withTimePartitioning.. I have changed my code as to not use shards 
and just write to One Big Query Table with multiple  partitions

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png, 
> image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-18 Thread Reuven Lax (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932698#comment-16932698
 ] 

Reuven Lax commented on BEAM-3772:
--

Are you calling withTimePartitioning on BigQueryIO? Also, are you using 
clustering?

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png, 
> image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-17 Thread Jose Puertos (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931600#comment-16931600
 ] 

Jose Puertos commented on BEAM-3772:


Here having the same issue with 2.12.0 and 2.15.0 . When looking into the Big 
Query Jobs it seems as the code for the next jobs trying to upload partitions 
after the first day use  CREATE_NEVER even though the code has WRITE_APPEND

 

withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND).

withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)

 

!image-2019-09-17-12-01-42-764.png|width=487,height=116!

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png, 
> image-2019-09-17-12-01-42-764.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-17 Thread Luke Cwik (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931573#comment-16931573
 ] 

Luke Cwik commented on BEAM-3772:
-

Another user reported this as well: 
https://stackoverflow.com/questions/57969706/bigqueryio-only-first-day-table-can-be-created-despite-having-createdispositi

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-17 Thread Dawid Bodych (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931152#comment-16931152
 ] 

Dawid Bodych commented on BEAM-3772:


I have a similar problem on SDK 2.15.0. First table (per worker it seems) gets 
created but next ones don't.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Reuven Lax
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-09-14 Thread Reuven Lax (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929839#comment-16929839
 ] 

Reuven Lax commented on BEAM-3772:
--

As mentioned on the PR, triggers will be per key after a reshuffle, so it 
should be impossible for a table to skip a trigger. We have fixed some bugs in 
this code though that could have potentially caused this error, and this Jira 
mentions Beam 2.9. Can you try with a more-recent Beam version? Also what is 
the code for your DynamicDestinations - is it deterministic?

 

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-01-30 Thread Marco Veluscek (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755958#comment-16755958
 ] 

Marco Veluscek commented on BEAM-3772:
--

I have upgraded my code to Apache beam 2.9 and Scio 0.7.0 and tried again, but 
the same issue still remains.

 

The code I posted above did not change much with the upgrade.

In the Dataflow Job details on Google Cloud Console I can see the following 
details:
|*userAgent*|Apache_Beam_SDK_for_Java/2.9.0|
|*scioVersion*|0.7.0|
|*scalaVersion*|2.12.8|

Looking at the job history of BigQuery, I can see a successful load job to 
create the first table. As matter of fact, the first table get created and data 
inserted.

The second table does not get created. The job history shows several failed 
load jobs.

The first successful job looks like as follows:

!bigquery-success.png!

The failed job looks like as follows:

!bigquery-fail.png!

I should be clearer: I have tried with both the DirectRunner and the 
DataflowRunner. The problem only occurs with the DataflowRunner. Is this the 
right place to ask for help with the DataflowRunner? If not, could you point me 
to the right place?

Thank you.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Major
> Attachments: bigquery-fail.png, bigquery-success.png
>
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-01-14 Thread Marco Veluscek (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742214#comment-16742214
 ] 

Marco Veluscek commented on BEAM-3772:
--

Ok, thank you, I'll try again with 2.9 to confirm this.

Thank you for your help.

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Chamikara Jayalath
>Priority: Major
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-01-11 Thread Reuven Lax (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740698#comment-16740698
 ] 

Reuven Lax commented on BEAM-3772:
--

There was a similar bug in the past, but I believe it was fixed. Surprising
if this is happening in 2.9.

The text of the error message you posted does not seem to match the current
code in Beam 2.9. Can you verify that you are actually using Beam 2.9?

On Fri, Jan 11, 2019 at 1:22 AM Marco Veluscek (JIRA) 



> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Eugene Kirpichov
>Priority: Major
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables are never created and I get these exceptions:
> {code:java}
> (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job 
> with id prefix 
> 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023,
>  reached max retries: 3, last failed load job: {
>   "configuration" : {
> "load" : {
>   "createDisposition" : "CREATE_NEVER",
>   "destinationTable" : {
> "datasetId" : "dev_mydataset",
> "projectId" : "myproject-id",
> "tableId" : "mytable_20180302_16"
>   },
> {code}
> The _CreateDisposition_ used is _CREATE_NEVER_, contrary as 
> _CREATE_IF_NEEDED_ as specified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

2019-01-10 Thread Marco Veluscek (JIRA)



[ 
https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739264#comment-16739264
 ] 

Marco Veluscek commented on BEAM-3772:
--

I am experiencing the same issue with Beam 2.6.0 and 2.9.0.

I am also using Spotify Scio 0.6.1.

Do you have any news about this issue?

I have written a sample code with Scio to test this problem. Is this code 
correct?

If I run the following code with the DirectRunner, it works, I am able to 
create all tables I need in BigQuery.
{code:java}
import com.google.api.services.bigquery.model.{TableFieldSchema, 
TableReference, TableRow, TableSchema}
import com.spotify.scio.ContextAndArgs
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition
import org.apache.beam.sdk.io.gcp.bigquery.{BigQueryIO, DynamicDestinations, 
TableDestination}
import org.apache.beam.sdk.values.ValueInSingleWindow
import org.joda.time.Duration

import scala.collection.JavaConverters._

object PubSubToBigQuery {

  case class Row(id: Long, data: String, data2: String, destination: String)

  /**
  * Run the example with
  * run \
  *   --project= \
  *   --runner=DataflowRunner \
  *   --stagingLocation=gs:///staging \
  *   --tempLocation=gs:///temp \
  *   --region=europe-west1
  *
  * Message Example 1: 1,string_data1,string_data2,table_name1
  * Message Example 2: 2,string_data1,string_data2,table_name2 --> the creation 
of this table fails
*/
  def main(cargs: Array[String]): Unit = {

val (sc, args) = ContextAndArgs(cargs)

sc
  
.pubsubSubscription[String]("projects//subscriptions/")
  .map(s => {
val fields = s.split(",")
Row(
  id = fields(0).toLong,
  data = fields(1),
  data2 = fields(2),
  destination = fields(3)
)
  })
  .saveAsCustomOutput(
"Custom bigquery IO",
BigQueryIO.write[Row]()
  .to(new DynamicDestinations[Row, String]() {

override def getDestination(element: ValueInSingleWindow[Row]): 
String =
  element.getValue.destination

override def getTable(destination: String): TableDestination =
  new TableDestination(
new TableReference()
  .setProjectId("")
  .setDatasetId("")
  .setTableId(destination),
"Table for destination " + destination)

override def getSchema(destination: String): TableSchema = {
  val nestedFields = List(new 
TableFieldSchema().setName("nested_data").setType("STRING")).asJava
  new TableSchema().setFields(
  List(
new TableFieldSchema().setName("id").setType("INTEGER"),
new TableFieldSchema()
  .setName("data")
  .setType("RECORD")
  .setMode("REPEATED")
  .setFields(nestedFields)
  ).asJava)
}
  })
  .withFormatFunction(
elem => {
  val nestedTableRow = List(
new TableRow().set("nested_data", elem.data),
new TableRow().set("nested_data", elem.data2)
  ).asJava
  new TableRow()
.set("id", elem.id)
.set("data", nestedTableRow)
}
  )
  
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
  .withWriteDisposition(WriteDisposition.WRITE_APPEND)
  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
  .withNumFileShards(1)
  .withTriggeringFrequency(Duration.standardSeconds(1))
  )

sc.close()
  }
}
{code}

> BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded 
> PCollection and FILE_LOADS
> 
>
> Key: BEAM-3772
> URL: https://issues.apache.org/jira/browse/BEAM-3772
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.2.0, 2.3.0
> Environment: Dataflow streaming pipeline
>Reporter: Benjamin BENOIST
>Assignee: Eugene Kirpichov
>Priority: Major
>
> My workflow : KAFKA -> Dataflow streaming -> BigQuery
> Given that having low-latency isn't important in my case, I use FILE_LOADS to 
> reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, 
> which is a table with the current hour as a suffix.
> This _BigQueryIO.Write_ is configured like this :
> {code:java}
> .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
> .withMethod(Method.FILE_LOADS)
> .withTriggeringFrequency(triggeringFrequency)
> .withNumFileShards(100)
> {code}
> The first table is successfully created and is written to. But then the 
> following tables

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS

13 matches

Site Navigation

Mail list logo

Footer information