[jira] [Commented] (BEAM-8222) Consider making insertId optional in BigQuery.insertAll
[ https://issues.apache.org/jira/browse/BEAM-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17122644#comment-17122644 ] Beam JIRA Bot commented on BEAM-8222: - This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3. Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean. > Consider making insertId optional in BigQuery.insertAll > --- > > Key: BEAM-8222 > URL: https://issues.apache.org/jira/browse/BEAM-8222 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Boyuan Zhang >Priority: P2 > Labels: stale-P2 > > Current implementation of > StreamingWriteFn(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java#L102) > sets insertId from input element, which is added an uniqueId by > https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java#L53. > Users report that if leaving insertId as empty, writing will be extremely > speeded up. Can we add an bqOption like, nonInsertId and emit empty id based > on this option? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8222) Consider making insertId optional in BigQuery.insertAll
[ https://issues.apache.org/jira/browse/BEAM-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930918#comment-16930918 ] Nahuel Lofeudo commented on BEAM-8222: -- What could someone do in order to use BigQuery's Streaming Api V2 through Dataflow? > Consider making insertId optional in BigQuery.insertAll > --- > > Key: BEAM-8222 > URL: https://issues.apache.org/jira/browse/BEAM-8222 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Boyuan Zhang >Priority: Major > > Current implementation of > StreamingWriteFn(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java#L102) > sets insertId from input element, which is added an uniqueId by > https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java#L53. > Users report that if leaving insertId as empty, writing will be extremely > speeded up. Can we add an bqOption like, nonInsertId and emit empty id based > on this option? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8222) Consider making insertId optional in BigQuery.insertAll
[ https://issues.apache.org/jira/browse/BEAM-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930697#comment-16930697 ] Chamikara Jayalath commented on BEAM-8222: -- Based on some offline comments from [~reuvenlax] this might be undesirable and may cause user confusion. AFAIK Dataflow and other Beam runners that support BigQueryIO.Sink are tolerant to failures and may retry workitems. So handling duplicates is required for the safety of inserted data. Without insertid things might speed up in the short term for runs without failures but this mode of execution is not safe in the long run. > Consider making insertId optional in BigQuery.insertAll > --- > > Key: BEAM-8222 > URL: https://issues.apache.org/jira/browse/BEAM-8222 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Boyuan Zhang >Priority: Major > > Current implementation of > StreamingWriteFn(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java#L102) > sets insertId from input element, which is added an uniqueId by > https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java#L53. > Users report that if leaving insertId as empty, writing will be extremely > speeded up. Can we add an bqOption like, nonInsertId and emit empty id based > on this option? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8222) Consider making insertId optional in BigQuery.insertAll
[ https://issues.apache.org/jira/browse/BEAM-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930341#comment-16930341 ] Ismaël Mejía commented on BEAM-8222: Any comments on this [~chamikara] ? > Consider making insertId optional in BigQuery.insertAll > --- > > Key: BEAM-8222 > URL: https://issues.apache.org/jira/browse/BEAM-8222 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Boyuan Zhang >Priority: Major > > Current implementation of > StreamingWriteFn(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java#L102) > sets insertId from input element, which is added an uniqueId by > https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java#L53. > Users report that if leaving insertId as empty, writing will be extremely > speeded up. Can we add an bqOption like, nonInsertId and emit empty id based > on this option? -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8222) Consider making insertId optional in BigQuery.insertAll
[ https://issues.apache.org/jira/browse/BEAM-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928946#comment-16928946 ] Nahuel Lofeudo commented on BEAM-8222: -- The request is to not populate the insertId field when calling insertAll(), in order to use BigQuery's Streaming API V2 as described here: [https://cloud.google.com/bigquery/quotas#streaming_inserts] " > Consider making insertId optional in BigQuery.insertAll > --- > > Key: BEAM-8222 > URL: https://issues.apache.org/jira/browse/BEAM-8222 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Boyuan Zhang >Priority: Major > > Current implementation of > StreamingWriteFn(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java#L102) > sets insertId from input element, which is added an uniqueId by > https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java#L53. > Users report that if leaving insertId as empty, writing will be extremely > speeded up. Can we add an bqOption like, nonInsertId and emit empty id based > on this option? -- This message was sent by Atlassian Jira (v8.3.2#803003)