[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394296#comment-16394296 ] Justin Tumale commented on BEAM-2817: - This has been completed and merged. I think it will be a part of the 2.4.0 release. > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale >Priority: Major > Labels: newbie, starter > Fix For: 2.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328319#comment-16328319 ] ASF GitHub Bot commented on BEAM-2817: -- justintumale closed pull request #4226: [BEAM-2817] BigQuery queries are allowed to run in either BATCH or INTERACTIVE mode URL: https://github.com/apache/beam/pull/4226 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale >Priority: Major > Labels: newbie, starter > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281482#comment-16281482 ] ASF GitHub Bot commented on BEAM-2817: -- justintumale closed pull request #4118: [BEAM-2817] BigQuery queries are allowed to run in either BATCH or INTERACTIVE mode URL: https://github.com/apache/beam/pull/4118 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale > Labels: newbie, starter > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281468#comment-16281468 ] ASF GitHub Bot commented on BEAM-2817: -- justintumale opened a new pull request #4226: [BEAM-2817] BigQuery queries are allowed to run in either BATCH or INTERACTIVE mode URL: https://github.com/apache/beam/pull/4226 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale > Labels: newbie, starter > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249081#comment-16249081 ] ASF GitHub Bot commented on BEAM-2817: -- GitHub user justintumale reopened a pull request: https://github.com/apache/beam/pull/4118 [BEAM-2817] BigQuery queries are allowed to run in either BATCH or INTERACTIVE mode Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- The Read.Builder for BigQueryIO takes in the priority as one of it's parameters and passes it through from the createSource method. This will allow for queries to be run in either batch mode or interactive mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/justintumale/beam master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4118 commit 64f1db399c1cf424b5bd385bc5543908c095619a Author: Justin Tumale Date: 2017-09-08T22:14:06Z BEAM-407: Removed OffsetRangeTracker bugs commit 464ea9626c1b97a29d6f0432c06ed4b6dc41e94f Author: Justin Tumale Date: 2017-09-08T22:20:36Z BEAM-407: fixes inconsistent synchronization in OffsetRangeTracker.copy commit 11ddb768131ac3091bdf1fd7827e465229e313af Author: justintumale Date: 2017-09-11T14:35:30Z Merge remote-tracking branch 'apache/master' commit d9b467410dc19e1844a78074d81c6e7c0745483e Author: justintumale Date: 2017-09-12T02:41:03Z fixes inconsistent synchronization issue commit 4a061070c271367d8072cb44a49b7953e9e1cdb2 Author: justintumale Date: 2017-09-19T05:14:46Z Merge branch 'master' of git://github.com/apache/beam updating forked repo commit d9bbb1cc6039422f22e8347ca177cdb74ff0bba9 Author: justintumale Date: 2017-09-19T08:09:13Z [BEAM-407] implemented changes based on code review commit a0f173b6fc875dc5730be03c5275aa1a4ec4e825 Author: justintumale Date: 2017-10-13T07:38:36Z fixing merged conflict commit 9bcbd2315b2bf68fefe6197c44b72825bcd87429 Author: justintumale Date: 2017-10-17T04:39:10Z Merge remote-tracking branch 'upstream/master' commit 11ababd190b5b1b462389e03e1eaf19d66e2ff92 Author: justintumale Date: 2017-10-23T00:56:53Z Merge branch 'master' of git://github.com/apache/beam commit aff4bd0d042d60a308d51aee90da854abed9a6bd Author: justintumale Date: 2017-11-02T07:16:52Z Merge branch 'master' of git://github.com/apache/beam commit 10d3d598b956997d3cc10ea0c7ab5b31fc62348b Author: justintumale Date: 2017-11-11T18:08:44Z update commit 3e294044f4d7df626fcc6954d1dceaea76580156 Author: Arnaud Fournier Date: 2017-07-20T14:57:38Z [BEAM-2728] Extension for sketch-based statistics : HyperLogLog commit 042508976e59c28950c9c9a11a00443623411719 Author: James Xu Date: 2017-09-13T12:36:37Z [BEAM-2528] create table commit fd9e5c3558481ce1bd953440c1f249ad5892f166 Author: Mairbek Khadikov Date: 2017-09-29T23:43:05Z Added a preprocessing step to the Cloud Spanner sink. The general intuition we follow here: if mutations are presorted by the primary key before batching, it is more likely that mutations in the batch will end up in the same partition. It minimizes the number of participants in the distributed transaction on the Cloud Spanner side and leads to a better throughput. Mutations are encoded before running other steps to avoid paying the serialization price. Primary keys are encoded using OrderedCode library, and ApproximateQuantiles transform is used to sample keys. Once primary keys are sampled, for each mutation we assign the index of the closest primary key as a key and group by that key. Range deletes are submitted separat
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249077#comment-16249077 ] ASF GitHub Bot commented on BEAM-2817: -- Github user justintumale closed the pull request at: https://github.com/apache/beam/pull/4118 > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale > Labels: newbie, starter > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249075#comment-16249075 ] ASF GitHub Bot commented on BEAM-2817: -- GitHub user justintumale opened a pull request: https://github.com/apache/beam/pull/4118 [BEAM-2817] BigQuery queries are allowed to run in either BATCH or INTERACTIVE mode Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- The Read.Builder for BigQueryIO takes in the priority as one of it's parameters and passes it through from the createSource method. This will allow for queries to be run in either batch mode or interactive mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/justintumale/beam master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4118 commit 64f1db399c1cf424b5bd385bc5543908c095619a Author: Justin Tumale Date: 2017-09-08T22:14:06Z BEAM-407: Removed OffsetRangeTracker bugs commit 464ea9626c1b97a29d6f0432c06ed4b6dc41e94f Author: Justin Tumale Date: 2017-09-08T22:20:36Z BEAM-407: fixes inconsistent synchronization in OffsetRangeTracker.copy commit 11ddb768131ac3091bdf1fd7827e465229e313af Author: justintumale Date: 2017-09-11T14:35:30Z Merge remote-tracking branch 'apache/master' commit d9b467410dc19e1844a78074d81c6e7c0745483e Author: justintumale Date: 2017-09-12T02:41:03Z fixes inconsistent synchronization issue commit 4a061070c271367d8072cb44a49b7953e9e1cdb2 Author: justintumale Date: 2017-09-19T05:14:46Z Merge branch 'master' of git://github.com/apache/beam updating forked repo commit d9bbb1cc6039422f22e8347ca177cdb74ff0bba9 Author: justintumale Date: 2017-09-19T08:09:13Z [BEAM-407] implemented changes based on code review commit a0f173b6fc875dc5730be03c5275aa1a4ec4e825 Author: justintumale Date: 2017-10-13T07:38:36Z fixing merged conflict commit 9bcbd2315b2bf68fefe6197c44b72825bcd87429 Author: justintumale Date: 2017-10-17T04:39:10Z Merge remote-tracking branch 'upstream/master' commit 11ababd190b5b1b462389e03e1eaf19d66e2ff92 Author: justintumale Date: 2017-10-23T00:56:53Z Merge branch 'master' of git://github.com/apache/beam commit aff4bd0d042d60a308d51aee90da854abed9a6bd Author: justintumale Date: 2017-11-02T07:16:52Z Merge branch 'master' of git://github.com/apache/beam commit 10d3d598b956997d3cc10ea0c7ab5b31fc62348b Author: justintumale Date: 2017-11-11T18:08:44Z update commit 3e294044f4d7df626fcc6954d1dceaea76580156 Author: Arnaud Fournier Date: 2017-07-20T14:57:38Z [BEAM-2728] Extension for sketch-based statistics : HyperLogLog commit 042508976e59c28950c9c9a11a00443623411719 Author: James Xu Date: 2017-09-13T12:36:37Z [BEAM-2528] create table commit fd9e5c3558481ce1bd953440c1f249ad5892f166 Author: Mairbek Khadikov Date: 2017-09-29T23:43:05Z Added a preprocessing step to the Cloud Spanner sink. The general intuition we follow here: if mutations are presorted by the primary key before batching, it is more likely that mutations in the batch will end up in the same partition. It minimizes the number of participants in the distributed transaction on the Cloud Spanner side and leads to a better throughput. Mutations are encoded before running other steps to avoid paying the serialization price. Primary keys are encoded using OrderedCode library, and ApproximateQuantiles transform is used to sample keys. Once primary keys are sampled, for each mutation we assign the index of the closest primary key as a key and group by that key. Range deletes are submitted separatel
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227292#comment-16227292 ] Chamikara Jayalath commented on BEAM-2817: -- I don't think there's a need to use BigQueryOptions for this. It should be fine to take this parameter as one of the options of the Read.Builder and pass it through from createSource as you mentioned. > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale > Labels: newbie, starter > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226259#comment-16226259 ] Justin Tumale commented on BEAM-2817: - Hi [~chamikara], I was wondering about what you think would be the best way to pass in this configuration. In the BigQueryIO class, there is a createSource method (line 573) which calls BigQueryQuerySource.create(...) which I can pass in the priority to. When executeQuery (line 133) is called from this BigQueryQuerySource object, the configuration will be based on the priority passed upon create. Another way I was thinking was to have the configuration set from the BigQueryOptions. Please advise when you get a chance. Thanks, -Justin cc [~laraschmidt] > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Justin Tumale > Labels: newbie, starter > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not
[ https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144591#comment-16144591 ] Chamikara Jayalath commented on BEAM-2817: -- Looks like this will be a good starter/newbie issue. Relevant links: https://cloud.google.com/bigquery/docs/running-queries#bigquery-query-batch-api https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L167 > Bigquery queries should allow options to run in batch mode or not > - > > Key: BEAM-2817 > URL: https://issues.apache.org/jira/browse/BEAM-2817 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp >Affects Versions: 2.0.0 >Reporter: Lara Schmidt >Assignee: Chamikara Jayalath > > When bigquery read does a query it sets the mode to batch. A batch query can > be very slow to schedule as it batches it with other queries. However it > doesn't use batch quota which is better for some cases. However, in some > cases a fast query is better (especially in timed tests). It would be a good > idea to have a configuration to the bigquery source to set this per-read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)