[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2018-03-10 Thread Justin Tumale (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394296#comment-16394296
 ] 

Justin Tumale commented on BEAM-2817:
-

This has been completed and merged. I think it will be a part of the 2.4.0 
release.

> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>Priority: Major
>  Labels: newbie, starter
> Fix For: 2.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2018-01-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328319#comment-16328319
 ] 

ASF GitHub Bot commented on BEAM-2817:
--

justintumale closed pull request #4226: [BEAM-2817] BigQuery queries are 
allowed to run in either BATCH or INTERACTIVE mode
URL: https://github.com/apache/beam/pull/4226
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):



 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>Priority: Major
>  Labels: newbie, starter
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281482#comment-16281482
 ] 

ASF GitHub Bot commented on BEAM-2817:
--

justintumale closed pull request #4118: [BEAM-2817] BigQuery queries are 
allowed to run in either BATCH or INTERACTIVE mode
URL: https://github.com/apache/beam/pull/4118
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>  Labels: newbie, starter
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281468#comment-16281468
 ] 

ASF GitHub Bot commented on BEAM-2817:
--

justintumale opened a new pull request #4226: [BEAM-2817] BigQuery queries are 
allowed to run in either BATCH or INTERACTIVE mode
URL: https://github.com/apache/beam/pull/4226
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>  Labels: newbie, starter
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249081#comment-16249081
 ] 

ASF GitHub Bot commented on BEAM-2817:
--

GitHub user justintumale reopened a pull request:

https://github.com/apache/beam/pull/4118

[BEAM-2817] BigQuery queries are allowed to run in either BATCH or 
INTERACTIVE mode

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
The Read.Builder for BigQueryIO takes in the priority as one of it's 
parameters and passes it through from the createSource method. This will allow 
for queries to be run in either batch mode or interactive mode.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/justintumale/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4118


commit 64f1db399c1cf424b5bd385bc5543908c095619a
Author: Justin Tumale 
Date:   2017-09-08T22:14:06Z

BEAM-407: Removed OffsetRangeTracker bugs

commit 464ea9626c1b97a29d6f0432c06ed4b6dc41e94f
Author: Justin Tumale 
Date:   2017-09-08T22:20:36Z

BEAM-407: fixes inconsistent synchronization in OffsetRangeTracker.copy

commit 11ddb768131ac3091bdf1fd7827e465229e313af
Author: justintumale 
Date:   2017-09-11T14:35:30Z

Merge remote-tracking branch 'apache/master'

commit d9b467410dc19e1844a78074d81c6e7c0745483e
Author: justintumale 
Date:   2017-09-12T02:41:03Z

fixes inconsistent synchronization issue

commit 4a061070c271367d8072cb44a49b7953e9e1cdb2
Author: justintumale 
Date:   2017-09-19T05:14:46Z

Merge branch 'master' of git://github.com/apache/beam

updating forked repo

commit d9bbb1cc6039422f22e8347ca177cdb74ff0bba9
Author: justintumale 
Date:   2017-09-19T08:09:13Z

[BEAM-407] implemented changes based on code review

commit a0f173b6fc875dc5730be03c5275aa1a4ec4e825
Author: justintumale 
Date:   2017-10-13T07:38:36Z

fixing merged conflict

commit 9bcbd2315b2bf68fefe6197c44b72825bcd87429
Author: justintumale 
Date:   2017-10-17T04:39:10Z

Merge remote-tracking branch 'upstream/master'

commit 11ababd190b5b1b462389e03e1eaf19d66e2ff92
Author: justintumale 
Date:   2017-10-23T00:56:53Z

Merge branch 'master' of git://github.com/apache/beam

commit aff4bd0d042d60a308d51aee90da854abed9a6bd
Author: justintumale 
Date:   2017-11-02T07:16:52Z

Merge branch 'master' of git://github.com/apache/beam

commit 10d3d598b956997d3cc10ea0c7ab5b31fc62348b
Author: justintumale 
Date:   2017-11-11T18:08:44Z

update

commit 3e294044f4d7df626fcc6954d1dceaea76580156
Author: Arnaud Fournier 
Date:   2017-07-20T14:57:38Z

[BEAM-2728] Extension for sketch-based statistics : HyperLogLog

commit 042508976e59c28950c9c9a11a00443623411719
Author: James Xu 
Date:   2017-09-13T12:36:37Z

[BEAM-2528] create table

commit fd9e5c3558481ce1bd953440c1f249ad5892f166
Author: Mairbek Khadikov 
Date:   2017-09-29T23:43:05Z

Added a preprocessing step to the Cloud Spanner sink.

The general intuition we follow here: if mutations are presorted by the 
primary key before batching, it is more likely that mutations in the batch will 
end up in the same partition. It minimizes the number of participants in the 
distributed transaction on the Cloud Spanner side and leads to a better 
throughput.

Mutations are encoded before running other steps to avoid paying the 
serialization price. Primary keys are encoded using OrderedCode library, and 
ApproximateQuantiles transform is used to sample keys.

Once primary keys are sampled, for each mutation we assign the index of the 
closest primary key as a key and group by that key. Range deletes are submitted 
separat

[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249077#comment-16249077
 ] 

ASF GitHub Bot commented on BEAM-2817:
--

Github user justintumale closed the pull request at:

https://github.com/apache/beam/pull/4118


> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>  Labels: newbie, starter
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-11-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249075#comment-16249075
 ] 

ASF GitHub Bot commented on BEAM-2817:
--

GitHub user justintumale opened a pull request:

https://github.com/apache/beam/pull/4118

[BEAM-2817] BigQuery queries are allowed to run in either BATCH or 
INTERACTIVE mode

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---
The Read.Builder for BigQueryIO takes in the priority as one of it's 
parameters and passes it through from the createSource method. This will allow 
for queries to be run in either batch mode or interactive mode.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/justintumale/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4118


commit 64f1db399c1cf424b5bd385bc5543908c095619a
Author: Justin Tumale 
Date:   2017-09-08T22:14:06Z

BEAM-407: Removed OffsetRangeTracker bugs

commit 464ea9626c1b97a29d6f0432c06ed4b6dc41e94f
Author: Justin Tumale 
Date:   2017-09-08T22:20:36Z

BEAM-407: fixes inconsistent synchronization in OffsetRangeTracker.copy

commit 11ddb768131ac3091bdf1fd7827e465229e313af
Author: justintumale 
Date:   2017-09-11T14:35:30Z

Merge remote-tracking branch 'apache/master'

commit d9b467410dc19e1844a78074d81c6e7c0745483e
Author: justintumale 
Date:   2017-09-12T02:41:03Z

fixes inconsistent synchronization issue

commit 4a061070c271367d8072cb44a49b7953e9e1cdb2
Author: justintumale 
Date:   2017-09-19T05:14:46Z

Merge branch 'master' of git://github.com/apache/beam

updating forked repo

commit d9bbb1cc6039422f22e8347ca177cdb74ff0bba9
Author: justintumale 
Date:   2017-09-19T08:09:13Z

[BEAM-407] implemented changes based on code review

commit a0f173b6fc875dc5730be03c5275aa1a4ec4e825
Author: justintumale 
Date:   2017-10-13T07:38:36Z

fixing merged conflict

commit 9bcbd2315b2bf68fefe6197c44b72825bcd87429
Author: justintumale 
Date:   2017-10-17T04:39:10Z

Merge remote-tracking branch 'upstream/master'

commit 11ababd190b5b1b462389e03e1eaf19d66e2ff92
Author: justintumale 
Date:   2017-10-23T00:56:53Z

Merge branch 'master' of git://github.com/apache/beam

commit aff4bd0d042d60a308d51aee90da854abed9a6bd
Author: justintumale 
Date:   2017-11-02T07:16:52Z

Merge branch 'master' of git://github.com/apache/beam

commit 10d3d598b956997d3cc10ea0c7ab5b31fc62348b
Author: justintumale 
Date:   2017-11-11T18:08:44Z

update

commit 3e294044f4d7df626fcc6954d1dceaea76580156
Author: Arnaud Fournier 
Date:   2017-07-20T14:57:38Z

[BEAM-2728] Extension for sketch-based statistics : HyperLogLog

commit 042508976e59c28950c9c9a11a00443623411719
Author: James Xu 
Date:   2017-09-13T12:36:37Z

[BEAM-2528] create table

commit fd9e5c3558481ce1bd953440c1f249ad5892f166
Author: Mairbek Khadikov 
Date:   2017-09-29T23:43:05Z

Added a preprocessing step to the Cloud Spanner sink.

The general intuition we follow here: if mutations are presorted by the 
primary key before batching, it is more likely that mutations in the batch will 
end up in the same partition. It minimizes the number of participants in the 
distributed transaction on the Cloud Spanner side and leads to a better 
throughput.

Mutations are encoded before running other steps to avoid paying the 
serialization price. Primary keys are encoded using OrderedCode library, and 
ApproximateQuantiles transform is used to sample keys.

Once primary keys are sampled, for each mutation we assign the index of the 
closest primary key as a key and group by that key. Range deletes are submitted 
separatel

[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-10-31 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227292#comment-16227292
 ] 

Chamikara Jayalath commented on BEAM-2817:
--

I don't think there's a need to use BigQueryOptions for this. It should be fine 
to take this parameter as one of the options of the Read.Builder and pass it 
through from createSource as you mentioned.

> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>  Labels: newbie, starter
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-10-30 Thread Justin Tumale (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226259#comment-16226259
 ] 

Justin Tumale commented on BEAM-2817:
-

Hi [~chamikara], I was wondering about what you think would be the best way to 
pass in this configuration. In the BigQueryIO class, there is a createSource 
method (line 573) which calls BigQueryQuerySource.create(...) which I can pass 
in the priority to. When executeQuery (line 133) is called from this 
BigQueryQuerySource object, the configuration will be based on the priority 
passed upon create. Another way I was thinking was to have the configuration 
set from the BigQueryOptions. Please advise when you get a chance.

Thanks,
-Justin

cc [~laraschmidt]

> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Justin Tumale
>  Labels: newbie, starter
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2817) Bigquery queries should allow options to run in batch mode or not

2017-08-28 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144591#comment-16144591
 ] 

Chamikara Jayalath commented on BEAM-2817:
--

Looks like this will be a good starter/newbie issue.

Relevant links:

https://cloud.google.com/bigquery/docs/running-queries#bigquery-query-batch-api
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L167

> Bigquery queries should allow options to run in batch mode or not
> -
>
> Key: BEAM-2817
> URL: https://issues.apache.org/jira/browse/BEAM-2817
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-gcp
>Affects Versions: 2.0.0
>Reporter: Lara Schmidt
>Assignee: Chamikara Jayalath
>
> When bigquery read does a query it sets the mode to batch. A batch query can 
> be very slow to schedule as it batches it with other queries. However it 
> doesn't use batch quota which is better for some cases. However, in some 
> cases a fast query is better (especially in timed tests). It would be a good 
> idea to have a configuration to the bigquery source to set this per-read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)