[GitHub] beam pull request #4147: [BEAM-3209] Clarify documentation on support for re...

2017-11-17 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/4147

[BEAM-3209] Clarify documentation on support for reading from/writing to 
time par…

…titioned BQ tables.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
clarify_time_partitioned_documentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4147


commit 39dbcda6fd45240aa4d7c1c04438896a9a114b2c
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-11-17T23:29:57Z

Clarify documentation on support for reading from/writing to time 
partitioned BQ tables.




---


[GitHub] beam pull request #4067: Updates Python datastore wordcount example to take ...

2017-10-31 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/4067

Updates Python datastore wordcount example to take a dataset parameter.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_datastore_example

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4067.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4067


commit 0d565d6d2a8e8c85089b2e8ea75eb768fa07d2df
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-11-01T01:37:29Z

Updates Python datastore wordcount example to take a dataset parameter.




---


[GitHub] beam pull request #4064: [BEAM-1630] Adds support for processing Splittable ...

2017-10-31 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/4064

[BEAM-1630] Adds support for processing Splittable DoFns using DirectRunner.

Updates DoFn invocation logic to allow invoking SDF methods.
Adds SDF machinery that will be common to DirectRunner and other runners.
Adds DirectRunner specific transform overrides, evaluators, and other logic 
for processing Splittable DoFns.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam sdf_direct_runner_3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4064


commit 7549b5c2ebe2ae47af9066eaf97364a27e828ab5
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-10-31T08:16:43Z

Adds support for processing Splittable DoFns using DirectRunner.

Updates DoFnInvocation logic to allow invoking SDF methods.
Adds SDF machinery that will be common to DirectRunner and other runners.
Adds DirectRunner specific transform overrides, evaluators, and other logic 
for processing Splittable DoFns.




---


[GitHub] beam pull request #4025: [BEAM-3088] Improves size estimation of BigQueryTab...

2017-10-21 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/4025

[BEAM-3088] Improves size estimation of BigQueryTableSource.

Updates BigQueryTableSource to consider data in streaming buffer when 
determining estimated size.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam bq_size_estimation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4025


commit 501b43800e95a8722315c43c7379725407d04f7c
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-10-22T02:20:07Z

Updates BigQueryTableSource to consider data in streaming buffer when 
determining estimated size.




---


[GitHub] beam pull request #3998: [BEAM-3029] Sets user agent in BigTableIO.Read.getB...

2017-10-18 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/3998


---


[GitHub] beam pull request #4007: [BEAM-3065] Avoids generating proto files for Windo...

2017-10-17 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/4007

[BEAM-3065] Avoids generating proto files for Windows if grpcio-tools is 
not installed.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
avoid_proto_generation_windows

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4007.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4007


commit 0ca6d3025c0479d7e6bd3a70bca84f651e717167
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-10-18T01:46:40Z

Avoids generating proto files for Windows if grpcio-tools is not installed.




---


[GitHub] beam pull request #3998: [BEAM-3029] Sets user agent in BigTableIO.Read.getB...

2017-10-16 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3998

[BEAM-3029] Sets user agent in BigTableIO.Read.getBigTableService().

Cherry-picking this commit to 2.2.0 release branch.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
bigtable_read_it_fix_cerrypick

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3998


commit 25cab6be8d763a03d2a37f0647698cba79df6ac5
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-10-16T07:50:03Z

Sets user agent in BigTableIO.Read.getBigTableService().




---


[GitHub] beam pull request #3996: [BEAM-3029] Sets userAgent option in BigTableReadIT

2017-10-16 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3996

[BEAM-3029] Sets userAgent option in BigTableReadIT

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam bigtable-it

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3996


commit 4457928a3d7e82426ff6019642d4e846131201b4
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-10-16T07:50:03Z

Sets userAgent option in BigTableReadIT




---


[GitHub] beam pull request #3962: [Beam-3028] Fixes a bug in DatastoreIO query splitt...

2017-10-08 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3962

[Beam-3028] Fixes a bug in DatastoreIO query splitting.

We were returning original query instead of the sub-queries resulting in 
data duplication when reading.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam query_splitting

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3962


commit 636b56964b750fba025c42e260219b60b085a868
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-10-09T00:02:43Z

Fixes a bug in query splitting.

We were returning original query instead of the sub-queries resulting in 
data duplication when reading.




---


[GitHub] beam pull request #3892: [BEAM-2985] Updates WriteToBigQuery PTransform to g...

2017-09-22 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3892

[BEAM-2985] Updates WriteToBigQuery PTransform to get project id from 
GoogleCloud…

…Options when using DirectRunner.

WriteToBigQuery PTransform behaves differently for DirectRunner and 
DataflowRunner when it comes to determining the project that the output table 
belongs to. If a project is not specified, DataflowRunner defauls to 
GoogleCloudOptions.project while DirectRunner does not. This PR fixes this 
inconsistency by defaulting to GoogleCloudOptions.project for DirectRunner as 
well.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam bq_direct_runner_write

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3892.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3892


commit f99db7932cab90dda2741d22b291e7f1eaad7336
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-09-23T00:59:50Z

Updates WriteToBigQuery PTransform to get project id from 
GoogleCloudOptions when using DirectRunner.

WriteToBigQuery PTransform behaves differently for DirectRunner and 
DataflowRunner when it comes to determining the project that the output table 
belongs to. If a project is not specified, DataflowRunner defauls to 
GoogleCloudOptions.project while DirectRunner does not. This PR fixes this 
inconsistency by defaulting to GoogleCloudOptions.project for DirectRunner as 
well.




---


[GitHub] beam pull request #3882: [BEAM-1630] Adds API for defining Splittable DoFns ...

2017-09-21 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3882

[BEAM-1630] Adds API for defining Splittable DoFns using Python SDK.

See https://s.apache.org/splittable-do-fn-python-sdk for the design.

This PR and the above doc were updated to reflect following recent updates 
to Splittable DoFn.
* Support for ProcessContinuations
* Support for dynamically updating output watermark irrespective of the 
output element production.

This will be followed by a PR that adds support for reading Splittable 
DoFns using DirectRunner.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam sdf_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3882


commit 2fd11b1c0e212a1b267dbafd69c96e26fef4d319
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-09-22T00:43:11Z

Adds API for defining Splittable DoFns.

See https://s.apache.org/splittable-do-fn-python-sdk for the design.

This PR and the above doc were updated to reflect following recent updates 
to Splittable DoFn.
* Support for ProcessContinuations
* Support for dynamically updating output watermark irrespective of the 
output element production.

This will be followed by a PR that adds support for reading Splittable 
DoFns using DirectRunner.




---


[GitHub] beam pull request #3820: [BEAM-2545] Updates bigtable.version to 1.0.0-pre3.

2017-09-08 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3820

[BEAM-2545] Updates bigtable.version to 1.0.0-pre3.

Performs a slight update to BigtableServiceImpl to comply with the new 
version.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_bigtable_dependency

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3820


commit 27d95db2a22738d16177157b69f87deff58477db
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-09-08T07:11:11Z

Updates bigtable.version to 1.0.0-pre3.

Performs a slight update to BigtableServiceImpl to comply with the new 
version.




---


[GitHub] beam pull request #3731: Fixes a pydocs validation failure due to a recent c...

2017-08-17 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3731

Fixes a pydocs validation failure due to a recent commit.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam datastore_docs_failure

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3731.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3731


commit 8dc6e1666f3f113fe5ee854f4c7060e0fbd614e1
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-08-18T01:21:44Z

Fixes a pydocs validation failure due to a recent commit.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3715: [BEAM-2711] Updates ByteKeyRangeTracker so that get...

2017-08-10 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3715

[BEAM-2711] Updates ByteKeyRangeTracker so that getFractionConsumed() does 
not fail for completed trackers

After this update:
* getFractionConsumed() returns 1.0 after markDone() is set.
* getFractionConsumed() returns 1.0 after tryReturnRecordAt() is invoked 
for a position that is larger than or equal to the end key.

This is similar to how getFractionConsumed() method of OffsetRangeTracker 
is implemented.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam key_range_progress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3715


commit ba08ec3bfb1eead06772945ab888d910ffe7d436
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-08-11T00:35:37Z

Updates ByteKeyRangeTracker so that getFractionConsumed() does not fail for 
completed trackers.

After this update:
* getFractionConsumed() returns 1.0 after markDone() is set.
* getFractionConsumed() returns 1.0 after tryReturnRecordAt() is invoked 
for a position that is larger than or equal to the end key.

This is similar to how getFractionConsumed() method of OffsetRangeTracker 
is implemented.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3701: Updates BEAM_CONTAINER_VERSION to 2.2.0.

2017-08-08 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3701

Updates BEAM_CONTAINER_VERSION to 2.2.0.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_container_version_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3701.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3701


commit cc7b4da2f88c0e5fdfc27c0588d0cc66a489a928
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-08-08T06:47:57Z

Updates BEAM_CONTAINER_VERSION to 2.2.0.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3681: [BEAM-2708] Adds support for reading concatenated b...

2017-08-03 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3681

[BEAM-2708] Adds support for reading concatenated bzip2 files

Cherry-picking into 2.1.0 release branch.

Corresponding fix for Java SDK was already cherry picked into 2.1.0 branch. 
I think it's good to get the Python SDK fix in as well so that SDKs are 
consistent.

Adds support for reading concatenated bzip2 files

Adds tests for concatenated gzip and bzip2 files.

Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' 
since it's actually hitting 'DummyReadTransform' and not testing this feature.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam bzip2_python_cherrypick

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3681.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3681


commit d6516c69e61f2061005d01a9e36ee1e4137a1478
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-08-03T05:49:33Z

Adds support for reading concatenated bzip2 files.

Adds tests for concatenated gzip and bzip2 files.

Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' 
since it's actually hitting 'DummyReadTransform' and not testing this feature.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3678: [BEAM-2708] Adds support for reading concatenated b...

2017-08-03 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3678

[BEAM-2708] Adds support for reading concatenated bzip2 files

Adds tests for concatenated gzip and bzip2 files.

Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' 
since it's actually hitting 'DummyReadTransform' and not testing this feature.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam pbzip2_test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3678


commit 40e1fbf1856190418d0c6c25c746037d4c109083
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-08-03T05:49:33Z

Adds support for reading concatenated bzip2 files.

Adds tests for concatenated gzip and bzip2 files.

Removes test 'test_model_textio_gzip_concatenated' in 'snippets_test.py' 
since it's actually hitting 'DummyReadTransform' and not testing this feature.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3668: [BEAM-2141] Updates jenkins job for JDBCIOIT

2017-07-31 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3668

[BEAM-2141] Updates jenkins job for JDBCIOIT

This is a slightly updated version of Stephen Sisk's 
https://github.com/apache/beam/pull/3604.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam enable_jdbc_it

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3668


commit d61ca8f8845c4290e3e88dd9da2bd94605ab141b
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-07-31T18:50:46Z

Updates jenkins job for JDBCIOIT.

This is a slightly updated version of Stephen Sisk's 
https://github.com/apache/beam/pull/3604.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3661: [BEAM-2643] Adds two new Read PTransforms that can ...

2017-07-28 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3661

[BEAM-2643] Adds two new Read PTransforms that can be used to read a 
massive number of files

textio.ReadAllFromText is for reading a PCollection of text files/file 
patterns.
avroio.ReadAllFromAvro is for reading a PCollection of Avro files/file 
patterns.

Most of the logic was generalized to a new PTransform 
filebasedsource.ReadAllFiles so that other file-based sources can be easily 
adapted to follow the same pattern.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam fileio_read_all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3661


commit 5174e48a3e3ac495d452f438be916b3046ed1cf4
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-07-29T02:39:02Z

Adds two new Read PTransforms that can be used to read a massive number of 
files.

textio.ReadAllFromText is for reading a PCollection of text files/file 
patterns.
avroio.ReadAllFromAvro is for reading a PCollection of Avro files/file 
patterns.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3414: [BEAM-2494] Remove GroupedShuffleRangeTracker which...

2017-06-21 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3414

[BEAM-2494] Remove GroupedShuffleRangeTracker which is unused in the SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
remove_grouped_shuffle_range_tracker

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3414


commit fbe89781bbf32421cbafe19313e6fbe070115dc2
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-06-21T17:37:11Z

Remove GroupedShuffleRangeTracker which is unused in the SDK




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3333: [BEAM-1630] Adds ability to dynamically replace PTr...

2017-06-08 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/

[BEAM-1630] Adds ability to dynamically replace PTransforms during runtime.

Adds two new interfaces, PTransformMatcher and PTransformOverride.

Currently only supports replacements where input and output types are an 
exact match (we have to address complexities due to type hints before 
supporting replacements with different types).

This can be used to dynamically update a populated pipeline at runtime. 
Each runner can configure it's own overrides.

This will be used by SplittableDoFn where matching ParDo transforms will be 
dynamically replaced by SplittableParDo.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
sdf_direct_runner_ptransform_override

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam-site pull request #253: [BEAM-3240] Improves development and testing in...

2017-05-25 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam-site/pull/253

[BEAM-3240] Improves development and testing instructions related to Python 
SDK

Updates contribution guide to include development and testing instructions 
for Python SDK.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam-site contrib_guide_python

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #253


commit b9dd624d27afdcf5ce48f5def52f094fcd797acd
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-05-26T00:12:50Z

Updates contribution guide to include development and testing instructions 
for Python SDK.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3089: [BEAM-1340] Adds __all__ tags to classes in package...

2017-05-11 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3089

[BEAM-1340] Adds __all__ tags to classes in package apache_beam/io.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_public_api_all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3089


commit fe534068f58d2c96c3fbc2c94441b77c2e3e28a9
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-05-11T18:46:46Z

Adds __all__ tags to classes in package apache_beam/io.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3074: [BEAM-1340] Adds __all__ tags to modules in package...

2017-05-11 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/3074


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3074: [BEAM-1340] Adds __all__ tags to classes in package...

2017-05-10 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3074

[BEAM-1340] Adds __all__ tags to classes in package apache_beam/io

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_public_api_all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3074.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3074


commit b5ff3ba87869aab31eb502d039c853c46e7ff818
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-05-11T05:33:35Z

Adds __all__ tags to classes in package apache_beam/io.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3041: [BEAM-2241] Renames some python classes and functio...

2017-05-10 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/3041


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3041: [BEAM-2241] Renames some python classes and functio...

2017-05-10 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3041

[BEAM-2241] Renames some python classes and functions that were 
unnecessarily public

Adds a note to documentation of classes that are public but should be only 
used internally by the SDK (non-user facing classes).

Marks some of the modules as experimental.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_public_api_branch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3041.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3041


commit f4b80ea47fc3a3d4c1ba901e646c47981483eabd
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-05-10T08:44:56Z

Renames some python classes and functions that were unnecessarily public.

Adds a note to documentation of classes that are public but should be only 
used internally by the SDK (non-user facing classes).

Marks some of the modules as experimental.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #3036: [BEAM-2241] Renames some python classes and functio...

2017-05-09 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/3036

[BEAM-2241] Renames some python classes and functions that were 
unnecessarily public

Adds a note to documentation of classes that are public but should be only 
used internally by the SDK (non-user facing classes).

Marks some of the modules as experimental.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_public_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3036.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3036


commit e6f90ec7b8bd59bd6809edc1aa95e2e894dd2b84
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-05-10T02:56:14Z

Renames some python classes and functions that were unnecessarily public.

Adds a note to documentation of classes that are public but should be only 
used internally by the SDK (non-user facing classes).

Marks some of the modules as experimental.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2770: [BEAM-539] Fixes several issues of FileSink

2017-04-28 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/2770

[BEAM-539] Fixes several issues of FileSink

(1) Updates FileSink to fail for file name prefixes that only contain a 
single component (for example GCS buckets).

For example, currently FileSink fails for  gs://aaa while passing for 
gs://aaa/. This change makes FileSink fail for both cases (and makes the 
behavior consistent with Java).

(2) Updates the name of the temporary directory created by FileSink

Currently , for a filename prefix 'gs://aaa/bbb', the temp path would be of 
the form gs://aaa/bbb-temp-... .
This is error prone since a user pattern 'gs://aaa/bbb*' would match temp 
files. This changes makes the temp path format 'gs://aaa/beam-temp-bbb-...' 
instead.

To achieve above this PR adds a method 'split()' to FileSystem interface 
that is analogous to Python 'os.path.split()' (and which has the opposite 
effect of current method FileSystem.join())

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam gcs_root_location_file_sink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2770.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2770


commit b66ae881a5adcdc5be8ee67a6d4ad842a2ea0147
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-04-28T21:38:35Z

Fixes several issues of FileSink.

(1) Updates FileSink to fail for file name prefixes that only contain a 
single component (for example GCS buckets).

For example, currently FileSink fails for  gs://aaa while passing for 
gs://aaa/. This change makes FileSink fail for both cases (and makes the 
behaviour consistent with Java).

(2) Updates the name of the temporary directory created by FileSink

Currently , for a filename prefix 'gs://aaa/bbb', the temp path would be of 
the form gs://aaa/bbb-temp-... .
This is error prone since a user pattern 'gs://aaa/bbb*' would match temp 
files. This changes makes the temp path format 'gs://aaa/beam-temp-bbb-...' 
instead.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2536: [BEAM-1179] Renames assertions of source_test_utils

2017-04-13 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/2536

[BEAM-1179] Renames assertions of source_test_utils

Renames assertions of source_test_utils from camelcase to 
underscore-separated.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam rename_sourcetestutil_asserts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2536


commit 82ba164b6f0ca69abbc707163232fa5b5791dc9a
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-04-14T01:57:04Z

Update assertions of source_test_utils from camelcase to 
underscore-separated.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2519: [BEAM-1925] Updates DoFn invocation logic to be mor...

2017-04-12 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/2519

[BEAM-1925] Updates DoFn invocation logic to be more extensible.

Adds following abstractions.

DoFnSignature: describes the signature of a given DoFn object.
DoFnInvoker: defines a particular way for invoking DoFn methods.

I believe existing tests cover the updated code paths.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam sdf_direct_runner2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2519


commit ea542113b8936cba2295e61471218a5c01be9a58
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-04-07T20:41:28Z

Updates DoFn invocation logic to be more extensible.

Adds following abstractions.

DoFnSignature: describes the signature of a given DoFn object.
DoFnInvoker: defines a particular way for invoking DoFn methods.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2289: [BEAM-1782] Updates BigQuery read transform to corr...

2017-04-03 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/2289


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #2289: [BEAM-1782] Updates BigQuery read transform to corr...

2017-03-22 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/2289

[BEAM-1782] Updates BigQuery read transform to correctly process empty 
repeated fields.

This fixes DirectRunnner. DataflowRunner is already processing these fields 
correctly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam bq_empty_repeated

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2289


commit f3da5eb0f70f51c3e0b4b304b55d56cba7cd3f99
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-03-22T20:17:26Z

Updates BigQuery read transform to correctly process empty repeated fields.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam-site pull request #186: Add chamikara as a committer

2017-03-17 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam-site/pull/186

Add chamikara as a committer



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam-site website_add_to_team

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam-site/pull/186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #186


commit a78aeafebc24996bf9a14fedc6b242a2db51eac6
Author: chamik...@google.com <chamik...@google.com>
Date:   2017-03-18T00:28:21Z

Add chamikara as a committer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1978: [BEAM-1463] Updates BigQuery read transform to hand...

2017-02-10 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/1978

[BEAM-1463] Updates BigQuery read transform to handle 'null' fields 
properly for DirectRunner

Updates BigQuery read transform so that DirectRunner handles 'null' fields 
properly.

Before this change, for DirectRunner, a record (dictionary) returned by 
BigQuery  read transform did not contain keys for fields that are 'null'. For 
DataflowRunner, these fields are available with value 'None'.
I believe, retaining these fields value 'None' to be the proper behavior 
here.

This change makes these two runners consistent when it comes to handling 
BigQuery 'null' values.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam 
reading_null_fields_directrunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1978.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1978


commit f6a94610b674760075c3c0af66b7b03da154f2bc
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-02-10T22:19:53Z

Updates BigQuery read transform so that DirectRunner handles 'null' fields 
properly.

Before this change, for DirectRunner, a record (dictionary) returned by 
BigQuery  read transform will not contain keys for fields that are 'null'. For 
DataflowRunner, these fields will be available with value 'None'.
I believe, retaining these fields value 'None' to be the proper behavior 
here.
This change makes these two runners consistent when it comes to handling 
BigQuery 'null' values.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1932: [BEAM-1406] Removes deprecated fileio.TextFileSink

2017-02-06 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/1932

[BEAM-1406] Removes deprecated fileio.TextFileSink

Users should be using textio.WriteToText() transform instead of
fileio.TextFileSink.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam remove_textfilesink_fileio

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1932


commit 2fed64bf5f3e7eda2a3a372556851cdbffeb1a1a
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-02-07T00:01:11Z

Removes deprecated fileio.TextFileSink.

Users should be using textio.WriteToText() transform instead of
fileio.TextFileSink.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1916: [BEAM-1388] Updates default values used by retry de...

2017-02-03 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/1916

[BEAM-1388] Updates default values used by retry decorator.

Updates following defaults so that total wait time by default is more 
practical.

num_retries from 16 to 7.
max_delay_secs from 4 hours to 1 hour.

With this update, for maximum number of retries, system will wait for 635 
sec while wait before last retry being 320 sec.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/beam update_retry_defaults

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1916.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1916


commit 640f5c61a25c100df0eca79b1a4417b81dbb9a83
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-02-04T01:32:49Z

Updates default values used by retry decorator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1866: [BEAM-1338] Moves ThreadPool creation to a util fun...

2017-01-30 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/1866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...

2017-01-24 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/1820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...

2017-01-23 Thread chamikaramj
GitHub user chamikaramj reopened a pull request:

https://github.com/apache/beam/pull/1820

[BEAM-1299] Removes Dataflow native text source and sink from Beam Python 
SDK.

Users should be using Beam text source and sink available in module 
'textio.py' instead of this.

Also removes Dataflow native file source/sink that is only used by native 
text source/sink.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
remove_native_text_source_sink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1820


commit ab6b7026da410a3084da58de873c6b8b809dd1fb
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-01-23T21:23:45Z

Removes Dataflow native text source and sink from Beam SDK.

Users should be using Beam text source and sink available in module 
'textio.py' instead of this.

Also removes Dataflow native file source/sink that is only used by native 
text source/sink.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...

2017-01-23 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/1820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1820: [BEAM-1299] Removes Dataflow native text source and...

2017-01-23 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/1820

[BEAM-1299] Removes Dataflow native text source and sink from Beam Python 
SDK.

Users should be using Beam text source and sink available in module 
'textio.py' instead of this.

Also removes Dataflow native file source/sink that is only used by native 
text source/sink.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
remove_native_text_source_sink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1820.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1820


commit ab6b7026da410a3084da58de873c6b8b809dd1fb
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-01-23T21:23:45Z

Removes Dataflow native text source and sink from Beam SDK.

Users should be using Beam text source and sink available in module 
'textio.py' instead of this.

Also removes Dataflow native file source/sink that is only used by native 
text source/sink.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1818: [BEAM-1298] Increments major used by Dataflow runne...

2017-01-23 Thread chamikaramj
GitHub user chamikaramj opened a pull request:

https://github.com/apache/beam/pull/1818

[BEAM-1298] Increments major used by Dataflow runner to 5

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
increment_major_version_5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1818


commit 1811458b0c33fba0dde909fc655452ad8a37c9f9
Author: Chamikara Jayalath <chamik...@google.com>
Date:   2017-01-23T18:25:28Z

Increments major version used by Dataflow runner to 5




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] beam pull request #1728: [BEAM-1239] Updates Python SDK examples to use Beam...

2017-01-03 Thread chamikaramj
Github user chamikaramj closed the pull request at:

https://github.com/apache/beam/pull/1728


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---