Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/783
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/783
Run findbugs in the test-compile phase
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
This should help some problems fail faster.
R: @dhalperi
CC: @swegner @jasonkuster
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam findbugs-earlier
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/783.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #783
commit 14c6d99e087b2e1606422821341136a5d5e8ec23
Author: Kenneth Knowles
Date: 2016-08-04T04:31:17Z
Run findbugs in the test-compile phase
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407076#comment-15407076
]
ASF GitHub Bot commented on BEAM-498:
-
GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/782
[BEAM-498] Port easy bits of the SDK to new DoFn
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
R: @bjchambers
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam transforms-new-DoFn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/782.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #782
commit 3fd414f9dd2b2aeb41ab7a24b635adb234ec5056
Author: Kenneth Knowles
Date: 2016-08-04T02:55:21Z
Port join library to new DoFn
commit ab22cd9933bd13571c03e7dd66e4d01a6e38d7f3
Author: Kenneth Knowles
Date: 2016-08-04T02:56:33Z
Port mentions of OldDoFn in PipelineOptions
commit 866d2c7dda01e04ff040b2ed655e6390c6b56ef4
Author: Kenneth Knowles
Date: 2016-08-04T03:15:12Z
Port easy Java SDK tests to new DoFn
commit 2504240de6115139addf051c354fae4b3c49b67c
Author: Kenneth Knowles
Date: 2016-08-04T03:15:58Z
Port PAssert to new DoFn
commit d16cc7f7dc14eaab1564b42177980ada149f7f99
Author: Kenneth Knowles
Date: 2016-08-04T03:22:26Z
Port easy I/O transforms to new DoFn
commit 3f949838df812012175a086c287e99c25bca894e
Author: Kenneth Knowles
Date: 2016-08-04T03:27:28Z
Port easy transforms to new DoFn
> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/782
[BEAM-498] Port easy bits of the SDK to new DoFn
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
R: @bjchambers
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam transforms-new-DoFn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/782.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #782
commit 3fd414f9dd2b2aeb41ab7a24b635adb234ec5056
Author: Kenneth Knowles
Date: 2016-08-04T02:55:21Z
Port join library to new DoFn
commit ab22cd9933bd13571c03e7dd66e4d01a6e38d7f3
Author: Kenneth Knowles
Date: 2016-08-04T02:56:33Z
Port mentions of OldDoFn in PipelineOptions
commit 866d2c7dda01e04ff040b2ed655e6390c6b56ef4
Author: Kenneth Knowles
Date: 2016-08-04T03:15:12Z
Port easy Java SDK tests to new DoFn
commit 2504240de6115139addf051c354fae4b3c49b67c
Author: Kenneth Knowles
Date: 2016-08-04T03:15:58Z
Port PAssert to new DoFn
commit d16cc7f7dc14eaab1564b42177980ada149f7f99
Author: Kenneth Knowles
Date: 2016-08-04T03:22:26Z
Port easy I/O transforms to new DoFn
commit 3f949838df812012175a086c287e99c25bca894e
Author: Kenneth Knowles
Date: 2016-08-04T03:27:28Z
Port easy transforms to new DoFn
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407011#comment-15407011
]
ASF GitHub Bot commented on BEAM-498:
-
GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/781
[BEAM-498] Port examples to new DoFn
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
R: @bjchambers pretty nice to be showcasing the `BoundedWindow` parameter
in a couple of them.
Any other reviewer can feel free to LGTM if they are happy with it. It is
pretty mechanical.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam examples-new-DoFn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/781.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #781
commit c31986f5ee31c34538a73057638a45efd796caf5
Author: Kenneth Knowles
Date: 2016-08-04T01:54:22Z
Port example tests to new DoFn
commit 8e6c0eaf66d7e5e6f2319aa13efe627b9fe3002a
Author: Kenneth Knowles
Date: 2016-08-04T02:01:16Z
Port TfIdf example to new DoFn
commit c3c42aafc1987d06c55e76e669f8e2c0b7b59b2c
Author: Kenneth Knowles
Date: 2016-08-04T02:03:11Z
Port TopWikipediaSessions example to new DoFn
commit 380f26afc73620abecbd38118095b96f1a91f030
Author: Kenneth Knowles
Date: 2016-08-04T02:04:50Z
Port GameState Java 8 example to new DoFn
commit 1584666cb2c5ef3d8c9edfefe4d8cd7b366ae133
Author: Kenneth Knowles
Date: 2016-08-04T02:06:26Z
Port the UserScore example to new DoFn
commit 86e90246d582327330af7f3212b6ed2c6a4f6af7
Author: Kenneth Knowles
Date: 2016-08-04T02:07:56Z
Port StreamingWordExtract example to new DoFn
commit 30da6afd8d06f63f0a0a7ebafc51e8d30217763c
Author: Kenneth Knowles
Date: 2016-08-04T02:08:19Z
fixup! UserScore
commit 18ee240879a0edc738355746f13b5c6b967babf7
Author: Kenneth Knowles
Date: 2016-08-04T02:09:39Z
Port TrafficMaxLaneFlow to new DoFn
commit 589337562cf59c511dbc49030a88110cbfcd5a3a
Author: Kenneth Knowles
Date: 2016-08-04T02:10:43Z
Port TrafficeRoutes example to new DoFn
commit 616411a1263604182f4cab7e899de3df22fc734d
Author: Kenneth Knowles
Date: 2016-08-04T02:12:08Z
Port DatastoreWordCount example to new DoFn
commit 4878f0b274c656f4d9951f471b7ef346fca58d1f
Author: Kenneth Knowles
Date: 2016-08-04T02:13:19Z
Port BigQueryTornadoes example to new DoFn
commit 480926d9591bfa0dace4af6d6883650bae61bb99
Author: Kenneth Knowles
Date: 2016-08-04T02:13:58Z
Port MaxPerKeyExamples to new DoFn
commit 607ed16fdc38ad19fc711844e0c55da6306d0882
Author: Kenneth Knowles
Date: 2016-08-04T02:14:37Z
Port CombinePerKeyExamples to new DoFn
commit e2262521eb4e84a258bfff03edab1440e91fd9f3
Author: Kenneth Knowles
Date: 2016-08-04T02:15:56Z
Port TriggerExample to new DoFn
commit 8b376606d9a956a2be9b70508010a19d34584d81
Author: Kenneth Knowles
Date: 2016-08-04T02:17:26Z
Port JoinExamples to new DoFn
commit 00e19ae9e690e35e73a0f8aff2c1a371d80c
Author: Kenneth Knowles
Date: 2016-08-04T02:18:07Z
Port FilterExamples to new DoFn
commit 16b9ca531970b6b32c91229df80926fa0d99714c
Author: Kenneth Knowles
Date: 2016-08-04T02:18:38Z
fixup! TriggerExample
commit ba47f11fb0d1aa99141ab12a4a3665a52d1e016e
Author: Kenneth Knowles
Date: 2016-08-04T02:19:38Z
Fix mention of DoFn in WordCountTest
> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/781
[BEAM-498] Port examples to new DoFn
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
R: @bjchambers pretty nice to be showcasing the `BoundedWindow` parameter
in a couple of them.
Any other reviewer can feel free to LGTM if they are happy with it. It is
pretty mechanical.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam examples-new-DoFn
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/781.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #781
commit c31986f5ee31c34538a73057638a45efd796caf5
Author: Kenneth Knowles
Date: 2016-08-04T01:54:22Z
Port example tests to new DoFn
commit 8e6c0eaf66d7e5e6f2319aa13efe627b9fe3002a
Author: Kenneth Knowles
Date: 2016-08-04T02:01:16Z
Port TfIdf example to new DoFn
commit c3c42aafc1987d06c55e76e669f8e2c0b7b59b2c
Author: Kenneth Knowles
Date: 2016-08-04T02:03:11Z
Port TopWikipediaSessions example to new DoFn
commit 380f26afc73620abecbd38118095b96f1a91f030
Author: Kenneth Knowles
Date: 2016-08-04T02:04:50Z
Port GameState Java 8 example to new DoFn
commit 1584666cb2c5ef3d8c9edfefe4d8cd7b366ae133
Author: Kenneth Knowles
Date: 2016-08-04T02:06:26Z
Port the UserScore example to new DoFn
commit 86e90246d582327330af7f3212b6ed2c6a4f6af7
Author: Kenneth Knowles
Date: 2016-08-04T02:07:56Z
Port StreamingWordExtract example to new DoFn
commit 30da6afd8d06f63f0a0a7ebafc51e8d30217763c
Author: Kenneth Knowles
Date: 2016-08-04T02:08:19Z
fixup! UserScore
commit 18ee240879a0edc738355746f13b5c6b967babf7
Author: Kenneth Knowles
Date: 2016-08-04T02:09:39Z
Port TrafficMaxLaneFlow to new DoFn
commit 589337562cf59c511dbc49030a88110cbfcd5a3a
Author: Kenneth Knowles
Date: 2016-08-04T02:10:43Z
Port TrafficeRoutes example to new DoFn
commit 616411a1263604182f4cab7e899de3df22fc734d
Author: Kenneth Knowles
Date: 2016-08-04T02:12:08Z
Port DatastoreWordCount example to new DoFn
commit 4878f0b274c656f4d9951f471b7ef346fca58d1f
Author: Kenneth Knowles
Date: 2016-08-04T02:13:19Z
Port BigQueryTornadoes example to new DoFn
commit 480926d9591bfa0dace4af6d6883650bae61bb99
Author: Kenneth Knowles
Date: 2016-08-04T02:13:58Z
Port MaxPerKeyExamples to new DoFn
commit 607ed16fdc38ad19fc711844e0c55da6306d0882
Author: Kenneth Knowles
Date: 2016-08-04T02:14:37Z
Port CombinePerKeyExamples to new DoFn
commit e2262521eb4e84a258bfff03edab1440e91fd9f3
Author: Kenneth Knowles
Date: 2016-08-04T02:15:56Z
Port TriggerExample to new DoFn
commit 8b376606d9a956a2be9b70508010a19d34584d81
Author: Kenneth Knowles
Date: 2016-08-04T02:17:26Z
Port JoinExamples to new DoFn
commit 00e19ae9e690e35e73a0f8aff2c1a371d80c
Author: Kenneth Knowles
Date: 2016-08-04T02:18:07Z
Port FilterExamples to new DoFn
commit 16b9ca531970b6b32c91229df80926fa0d99714c
Author: Kenneth Knowles
Date: 2016-08-04T02:18:38Z
fixup! TriggerExample
commit ba47f11fb0d1aa99141ab12a4a3665a52d1e016e
Author: Kenneth Knowles
Date: 2016-08-04T02:19:38Z
Fix mention of DoFn in WordCountTest
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Daniel Halperin created BEAM-532:
Summary: Let users specify compression mode in WordCount
Key: BEAM-532
URL: https://issues.apache.org/jira/browse/BEAM-532
Project: Beam
Issue Type: Bug
Components: examples-java
Affects Versions: Not applicable
Reporter: Daniel Halperin
Assignee: Daniel Halperin
Priority: Minor
Users often ask about compression. Let's build this into the default example.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user dhalperi opened a pull request:
https://github.com/apache/incubator-beam/pull/780
WordCount: add the option for users to specify compression
Also copy to archetypes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dhalperi/incubator-beam word-count-compression
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/780.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #780
commit 6e540a0ed78a504858123cd68e4cd257f9505144
Author: Dan Halperin
Date: 2016-08-04T01:47:00Z
WordCount: add the option for users to specify compression
Also copy to archetypes
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Ahmet Altay created BEAM-531:
Summary: Add support for getting aggregated values with dataflow
runner
Key: BEAM-531
URL: https://issues.apache.org/jira/browse/BEAM-531
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Ahmet Altay
Assignee: Charles Chen
Priority: Minor
The SDK for Python cannot extract metrics from the Dataflow service.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ahmet Altay created BEAM-530:
Summary: Decide where to place the tests and examples
Key: BEAM-530
URL: https://issues.apache.org/jira/browse/BEAM-530
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor
Right now they are literally part of the package space.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ahmet Altay created BEAM-529:
Summary: Check immutability violations in DirectPipelineRunner
Key: BEAM-529
URL: https://issues.apache.org/jira/browse/BEAM-529
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Ahmet Altay
Assignee: Ahmet Altay
Priority: Minor
Users are going to mutate inputs and outputs of DoFn inappropriately. We should
help their tests fail to catch such mistakes. (Similar to the
DirectPipelineRunner in Java SDK)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ahmet Altay created BEAM-528:
Summary: Add @experimental annotations
Key: BEAM-528
URL: https://issues.apache.org/jira/browse/BEAM-528
Project: Beam
Issue Type: New Feature
Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor
Experimental/deprecation warnings: use the warnings standard module in
conjunction with decorators as described here:
https://docs.python.org/2/library/warnings.html
Some code sample for a deprecated decorator that is kinda/sorta similar.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user chamikaramj opened a pull request:
https://github.com/apache/incubator-beam/pull/779
[BEAM-522] Fixes GcsIO.exists() to properly handle files that do not exist
Currently this invocation fails for non existing files instead of returning
false.
Updates FileSink.finalize_write() so that we capture and log any transient
errors that get thrown at the channel_factory.exists() call.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chamikaramj/incubator-beam
sink_finalize_fix_idempotency
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/779.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #779
commit 792c3b5c79b6e979bc34bcf457f8a33cebd74daf
Author: Chamikara Jayalath
Date: 2016-08-04T01:25:41Z
Fixes GcsIO.exists() to properly handle files that do not exist.
Currently this invocation fails for non existing files instead of returning
false.
Updates FileSink.finalize_write() so that we capture and log any transient
errors that get thrown at the channel_factory.exists() call.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406959#comment-15406959
]
ASF GitHub Bot commented on BEAM-522:
-
GitHub user chamikaramj opened a pull request:
https://github.com/apache/incubator-beam/pull/779
[BEAM-522] Fixes GcsIO.exists() to properly handle files that do not exist
Currently this invocation fails for non existing files instead of returning
false.
Updates FileSink.finalize_write() so that we capture and log any transient
errors that get thrown at the channel_factory.exists() call.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chamikaramj/incubator-beam
sink_finalize_fix_idempotency
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/779.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #779
commit 792c3b5c79b6e979bc34bcf457f8a33cebd74daf
Author: Chamikara Jayalath
Date: 2016-08-04T01:25:41Z
Fixes GcsIO.exists() to properly handle files that do not exist.
Currently this invocation fails for non existing files instead of returning
false.
Updates FileSink.finalize_write() so that we capture and log any transient
errors that get thrown at the channel_factory.exists() call.
> Update FileSink.finalize_write() to be idempotent
> -
>
> Key: BEAM-522
> URL: https://issues.apache.org/jira/browse/BEAM-522
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Currently FileSink.finelize_write() in fileio.py [1] performs following
> operations.
> (1) Obtains a list of temporary files as a side input
> (2) Renames each temporary file to the location where final output should be
> stored.
> iobase.Sink.finalize_write() operation should be idempotent since runner
> implementations may call this operation multiple times due to task failures.
> Current implementation is not idempotent because if we re-run the operation
> after renaming a sub-set of files, the operations may fail due to not being
> able to find some files at source location (for example, [2] for GCS files).
> We can fix this by checking if the destination file is already available
> before performing the rename and not performing the rename for files that are
> already available at the destination.
> [1]
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L503
> [2]
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/gcsio.py#L187
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Port DebuggingWordCount example from OldDoFn to DoFn
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/49d2f170
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/49d2f170
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/49d2f170
Branch: refs/heads/master
Commit: 49d2f1706f69c5106a9082ffd2fecaf69b2d868c
Parents: ca9e337
Author: Kenneth Knowles
Authored: Fri Jul 22 14:29:18 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 18:25:53 2016 -0700
--
.../java/org/apache/beam/examples/DebuggingWordCount.java| 8
1 file changed, 4 insertions(+), 4 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/49d2f170/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
--
diff --git
a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
index 3c43152..c1b273c 100644
---
a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
+++
b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
@@ -24,7 +24,7 @@ import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.testing.PAssert;
import org.apache.beam.sdk.transforms.Aggregator;
-import org.apache.beam.sdk.transforms.OldDoFn;
+import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.Sum;
import org.apache.beam.sdk.values.KV;
@@ -106,8 +106,8 @@ import java.util.regex.Pattern;
* overridden with {@code --inputFile}.
*/
public class DebuggingWordCount {
- /** A OldDoFn that filters for a specific key based upon a regular
expression. */
- public static class FilterTextFn extends OldDoFn,
KV> {
+ /** A DoFn that filters for a specific key based upon a regular expression.
*/
+ public static class FilterTextFn extends DoFn, KV> {
/**
* Concept #1: The logger below uses the fully qualified class name of
FilterTextFn
* as the logger. All log statements emitted by this logger will be
referenced by this name
@@ -133,7 +133,7 @@ public class DebuggingWordCount {
private final Aggregator unmatchedWords =
createAggregator("umatchedWords", new Sum.SumLongFn());
-@Override
+@ProcessElement
public void processElement(ProcessContext c) {
if (filter.matcher(c.element().getKey()).matches()) {
// Log at the "DEBUG" level each element that we match. When executing
this pipeline
Rename DoFnWithContext to DoFn
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/3bcb6f46
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/3bcb6f46
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/3bcb6f46
Branch: refs/heads/master
Commit: 3bcb6f46ad0ae483d1d8785edc2d9d5846c71a73
Parents: e160966
Author: Kenneth Knowles
Authored: Fri Jul 22 14:10:01 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 18:25:52 2016 -0700
--
.../org/apache/beam/sdk/transforms/DoFn.java| 429 +++
.../beam/sdk/transforms/DoFnReflector.java | 84 ++--
.../apache/beam/sdk/transforms/DoFnTester.java | 2 +-
.../beam/sdk/transforms/DoFnWithContext.java| 429 ---
.../org/apache/beam/sdk/transforms/OldDoFn.java | 2 +-
.../org/apache/beam/sdk/transforms/ParDo.java | 16 +-
.../beam/sdk/transforms/DoFnReflectorTest.java | 86 ++--
.../apache/beam/sdk/transforms/DoFnTest.java| 237 ++
.../sdk/transforms/DoFnWithContextTest.java | 237 --
.../apache/beam/sdk/transforms/ParDoTest.java | 12 +-
.../dofnreflector/DoFnReflectorTestHelper.java | 26 +-
.../transforms/DoFnReflectorBenchmark.java | 30 +-
12 files changed, 795 insertions(+), 795 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/3bcb6f46/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
new file mode 100644
index 000..eb6753c
--- /dev/null
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java
@@ -0,0 +1,429 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static com.google.common.base.Preconditions.checkArgument;
+import static com.google.common.base.Preconditions.checkNotNull;
+import static com.google.common.base.Preconditions.checkState;
+
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.Combine.CombineFn;
+import org.apache.beam.sdk.transforms.OldDoFn.DelegatingAggregator;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.transforms.display.HasDisplayData;
+import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
+import org.apache.beam.sdk.transforms.windowing.PaneInfo;
+import org.apache.beam.sdk.util.WindowingInternals;
+import org.apache.beam.sdk.values.PCollectionView;
+import org.apache.beam.sdk.values.TupleTag;
+import org.apache.beam.sdk.values.TypeDescriptor;
+
+import org.joda.time.Duration;
+import org.joda.time.Instant;
+
+import java.io.Serializable;
+import java.lang.annotation.Documented;
+import java.lang.annotation.ElementType;
+import java.lang.annotation.Retention;
+import java.lang.annotation.RetentionPolicy;
+import java.lang.annotation.Target;
+import java.util.HashMap;
+import java.util.Map;
+
+/**
+ * The argument to {@link ParDo} providing the code to use to process
+ * elements of the input
+ * {@link org.apache.beam.sdk.values.PCollection}.
+ *
+ * See {@link ParDo} for more explanation, examples of use, and
+ * discussion of constraints on {@code DoFn}s, including their
+ * serializability, lack of access to global shared mutable state,
+ * requirements for failure tolerance, and benefits of optimization.
+ *
+ * {@code DoFn}s can be tested in a particular
+ * {@code Pipeline} by running that {@code Pipeline} on sample input
+ * and then checking its output. Unit testing of a {@code DoFn},
+ * separately from any {@code ParDo} transform or {@code Pipeline},
+ * can be done via the {@link DoFnTester} harness.
+ *
+ * Implementations must define a method annotated with {@link
ProcessElement}
+ * that satisfies
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a64baf48/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnWithContext.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnWithContext.java
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnWithContext.java
index 8a83e44..b27163a 100644
---
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnWithContext.java
+++
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFnWithContext.java
@@ -24,7 +24,7 @@ import static com.google.common.base.Preconditions.checkState;
import org.apache.beam.sdk.annotations.Experimental;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.transforms.Combine.CombineFn;
-import org.apache.beam.sdk.transforms.DoFn.DelegatingAggregator;
+import org.apache.beam.sdk.transforms.OldDoFn.DelegatingAggregator;
import org.apache.beam.sdk.transforms.display.DisplayData;
import org.apache.beam.sdk.transforms.display.HasDisplayData;
import org.apache.beam.sdk.transforms.windowing.BoundedWindow;
@@ -127,7 +127,7 @@ public abstract class DoFnWithContext
implements Serializable,
*
* If invoked from {@link ProcessElement}), the timestamp
* must not be older than the input element's timestamp minus
- * {@link DoFn#getAllowedTimestampSkew}. The output element will
+ * {@link OldDoFn#getAllowedTimestampSkew}. The output element will
* be in the same windows as the input element.
*
* If invoked from {@link StartBundle} or {@link FinishBundle},
@@ -176,7 +176,7 @@ public abstract class DoFnWithContext
implements Serializable,
*
* If invoked from {@link ProcessElement}), the timestamp
* must not be older than the input element's timestamp minus
- * {@link DoFn#getAllowedTimestampSkew}. The output element will
+ * {@link OldDoFn#getAllowedTimestampSkew}. The output element will
* be in the same windows as the input element.
*
* If invoked from {@link StartBundle} or {@link FinishBundle},
@@ -194,7 +194,7 @@ public abstract class DoFnWithContext
implements Serializable,
}
/**
- * Information accessible when running {@link DoFn#processElement}.
+ * Information accessible when running {@link OldDoFn#processElement}.
*/
public abstract class ProcessContext extends Context {
@@ -358,13 +358,13 @@ public abstract class DoFnWithContext
implements Serializable,
/**
* Returns an {@link Aggregator} with aggregation logic specified by the
* {@link CombineFn} argument. The name provided must be unique across
- * {@link Aggregator}s created within the DoFn. Aggregators can only be
created
+ * {@link Aggregator}s created within the OldDoFn. Aggregators can only be
created
* during pipeline construction.
*
* @param name the name of the aggregator
* @param combiner the {@link CombineFn} to use in the aggregator
* @return an aggregator for the provided name and combiner in the scope of
- * this DoFn
+ * this OldDoFn
* @throws NullPointerException if the name or combiner is null
* @throws IllegalArgumentException if the given name collides with another
* aggregator in this scope
@@ -391,13 +391,13 @@ public abstract class DoFnWithContext
implements Serializable,
/**
* Returns an {@link Aggregator} with the aggregation logic specified by the
* {@link SerializableFunction} argument. The name provided must be unique
- * across {@link Aggregator}s created within the DoFn. Aggregators can only
be
+ * across {@link Aggregator}s created within the OldDoFn. Aggregators can
only be
* created during pipeline construction.
*
* @param name the name of the aggregator
* @param combiner the {@link SerializableFunction} to use in the aggregator
* @return an aggregator for the provided name and combiner in the scope of
- * this DoFn
+ * this OldDoFn
* @throws NullPointerException if the name or combiner is null
* @throws IllegalArgumentException if the given name collides with another
* aggregator in this scope
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a64baf48/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
index a31799e..4466874 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Filter.java
@@ -202,7 +202,7 @@ public class Filter extends PTransform
Port MinimalWordCount example from OldDoFn to DoFn
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/4ceec0e8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/4ceec0e8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/4ceec0e8
Branch: refs/heads/master
Commit: 4ceec0e86f1c4e885168957299dbe81c61fbc7e7
Parents: 64481d0
Author: Kenneth Knowles
Authored: Fri Jul 22 14:28:42 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 18:25:53 2016 -0700
--
.../java/org/apache/beam/examples/MinimalWordCount.java | 9 -
1 file changed, 4 insertions(+), 5 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/4ceec0e8/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
--
diff --git
a/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
b/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
index ab0bb6d..df725e3 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
@@ -22,8 +22,8 @@ import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Count;
+import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
-import org.apache.beam.sdk.transforms.OldDoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.KV;
@@ -89,12 +89,11 @@ public class MinimalWordCount {
// the input text (a set of Shakespeare's texts).
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*"))
// Concept #2: Apply a ParDo transform to our PCollection of text lines.
This ParDo invokes a
- // OldDoFn (defined in-line) on each element that tokenizes the text line
into individua
- // words.
+ // DoFn (defined in-line) on each element that tokenizes the text line
into individual words.
// The ParDo returns a PCollection, where each element is an
individual word in
// Shakespeare's collected texts.
- .apply("ExtractWords", ParDo.of(new OldDoFn() {
- @Override
+ .apply("ExtractWords", ParDo.of(new DoFn() {
+ @ProcessElement
public void processElement(ProcessContext c) {
for (String word : c.element().split("[^a-zA-Z']+")) {
if (!word.isEmpty()) {
Port WindowedWordCount example from OldDoFn to DoFn
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/ca9e3372
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/ca9e3372
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/ca9e3372
Branch: refs/heads/master
Commit: ca9e337203208c7c5876f0710fb3a45430a5b3a8
Parents: 4ceec0e
Author: Kenneth Knowles
Authored: Fri Jul 22 14:29:01 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 18:25:53 2016 -0700
--
.../org/apache/beam/examples/WindowedWordCount.java | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/ca9e3372/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java
--
diff --git
a/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java
b/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java
index 17f7da3..842cb54 100644
---
a/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java
+++
b/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java
@@ -27,7 +27,7 @@ import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
-import org.apache.beam.sdk.transforms.OldDoFn;
+import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.windowing.FixedWindows;
import org.apache.beam.sdk.transforms.windowing.Window;
@@ -103,14 +103,14 @@ public class WindowedWordCount {
static final int WINDOW_SIZE = 1; // Default window duration in minutes
/**
- * Concept #2: A OldDoFn that sets the data element timestamp. This is a
silly method, just for
+ * Concept #2: A DoFn that sets the data element timestamp. This is a silly
method, just for
* this example, for the bounded data case.
*
* Imagine that many ghosts of Shakespeare are all typing madly at the
same time to recreate
* his masterworks. Each line of the corpus will get a random associated
timestamp somewhere in a
* 2-hour period.
*/
- static class AddTimestampFn extends OldDoFn {
+ static class AddTimestampFn extends DoFn {
private static final Duration RAND_RANGE = Duration.standardHours(2);
private final Instant minTimestamp;
@@ -118,7 +118,7 @@ public class WindowedWordCount {
this.minTimestamp = new Instant(System.currentTimeMillis());
}
-@Override
+@ProcessElement
public void processElement(ProcessContext c) {
// Generate a timestamp that falls somewhere in the past two hours.
long randMillis = (long) (Math.random() * RAND_RANGE.getMillis());
@@ -130,9 +130,9 @@ public class WindowedWordCount {
}
}
- /** A OldDoFn that converts a Word and Count into a BigQuery table row. */
- static class FormatAsTableRowFn extends OldDoFn, TableRow> {
-@Override
+ /** A DoFn that converts a Word and Count into a BigQuery table row. */
+ static class FormatAsTableRowFn extends DoFn, TableRow> {
+@ProcessElement
public void processElement(ProcessContext c) {
TableRow row = new TableRow()
.set("word", c.element().getKey())
Port AutoComplete example from OldDoFn to DoFn
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/3236eec2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/3236eec2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/3236eec2
Branch: refs/heads/master
Commit: 3236eec22a8902393e6becefb771b9a4768ccc50
Parents: 49d2f17
Author: Kenneth Knowles
Authored: Fri Jul 22 14:29:37 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 18:25:53 2016 -0700
--
.../beam/examples/complete/AutoComplete.java| 30 ++--
1 file changed, 15 insertions(+), 15 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/3236eec2/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java
--
diff --git
a/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java
b/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java
index 7b44af8..1ab39c9 100644
---
a/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java
+++
b/examples/java/src/main/java/org/apache/beam/examples/complete/AutoComplete.java
@@ -36,9 +36,9 @@ import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.options.Validation;
import org.apache.beam.sdk.transforms.Count;
+import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.Filter;
import org.apache.beam.sdk.transforms.Flatten;
-import org.apache.beam.sdk.transforms.OldDoFn;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.Partition;
@@ -130,8 +130,8 @@ public class AutoComplete {
// Map the KV outputs of Count into our own CompletionCandiate class.
.apply("CreateCompletionCandidates", ParDo.of(
-new OldDoFn, CompletionCandidate>() {
- @Override
+new DoFn, CompletionCandidate>() {
+ @ProcessElement
public void processElement(ProcessContext c) {
c.output(new CompletionCandidate(c.element().getKey(),
c.element().getValue()));
}
@@ -209,8 +209,8 @@ public class AutoComplete {
}
private static class FlattenTops
-extends OldDoFn,
CompletionCandidate> {
- @Override
+extends DoFn,
CompletionCandidate> {
+ @ProcessElement
public void processElement(ProcessContext c) {
for (CompletionCandidate cc : c.element().getValue()) {
c.output(cc);
@@ -260,10 +260,10 @@ public class AutoComplete {
}
/**
- * A OldDoFn that keys each candidate by all its prefixes.
+ * A DoFn that keys each candidate by all its prefixes.
*/
private static class AllPrefixes
- extends OldDoFn> {
+ extends DoFn> {
private final int minPrefix;
private final int maxPrefix;
public AllPrefixes(int minPrefix) {
@@ -273,8 +273,8 @@ public class AutoComplete {
this.minPrefix = minPrefix;
this.maxPrefix = maxPrefix;
}
-@Override
- public void processElement(ProcessContext c) {
+@ProcessElement
+public void processElement(ProcessContext c) {
String word = c.element().value;
for (int i = minPrefix; i <= Math.min(word.length(), maxPrefix); i++) {
c.output(KV.of(word.substring(0, i), c.element()));
@@ -341,8 +341,8 @@ public class AutoComplete {
/**
* Takes as input a set of strings, and emits each #hashtag found therein.
*/
- static class ExtractHashtags extends OldDoFn {
-@Override
+ static class ExtractHashtags extends DoFn {
+@ProcessElement
public void processElement(ProcessContext c) {
Matcher m = Pattern.compile("#\\S+").matcher(c.element());
while (m.find()) {
@@ -351,8 +351,8 @@ public class AutoComplete {
}
}
- static class FormatForBigquery extends OldDoFn, TableRow> {
-@Override
+ static class FormatForBigquery extends DoFn, TableRow> {
+@ProcessElement
public void processElement(ProcessContext c) {
List completions = new ArrayList<>();
for (CompletionCandidate cc : c.element().getValue()) {
@@ -385,14 +385,14 @@ public class AutoComplete {
* Takes as input a the top candidates per prefix, and emits an entity
* suitable for writing to Datastore.
*/
- static class
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a64baf48/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
index 77c857c..7917aec 100644
---
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
+++
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
@@ -23,8 +23,8 @@ import static
com.google.common.base.Preconditions.checkNotNull;
import org.apache.beam.sdk.coders.AtomicCoder;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.CoderException;
-import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.OldDoFn;
import org.apache.beam.sdk.util.VarInt;
import com.google.common.base.MoreObjects;
@@ -38,8 +38,8 @@ import java.util.Objects;
/**
* Provides information about the pane an element belongs to. Every pane is
implicitly associated
* with a window. Panes are observable only via the
- * {@link org.apache.beam.sdk.transforms.DoFn.ProcessContext#pane} method of
the context
- * passed to a {@link DoFn#processElement} overridden method.
+ * {@link OldDoFn.ProcessContext#pane} method of the context
+ * passed to a {@link OldDoFn#processElement} overridden method.
*
* Note: This does not uniquely identify a pane, and should not be used for
comparisons.
*/
@@ -74,8 +74,8 @@ public final class PaneInfo {
* definitions:
*
* We'll call a pipeline 'simple' if it does not use
- * {@link org.apache.beam.sdk.transforms.DoFn.Context#outputWithTimestamp} in
- * any {@code DoFn}, and it uses the same
+ * {@link OldDoFn.Context#outputWithTimestamp} in
+ * any {@code OldDoFn}, and it uses the same
* {@link
org.apache.beam.sdk.transforms.windowing.Window.Bound#withAllowedLateness}
* argument value on all windows (or uses the default of {@link
org.joda.time.Duration#ZERO}).
* We'll call an element 'locally late', from the point of view of a
computation on a
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a64baf48/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java
index fe8b66f..03ff481 100644
---
a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java
+++
b/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java
@@ -21,8 +21,8 @@ import org.apache.beam.sdk.annotations.Experimental;
import org.apache.beam.sdk.annotations.Experimental.Kind;
import org.apache.beam.sdk.coders.Coder;
import org.apache.beam.sdk.coders.Coder.NonDeterministicException;
-import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.GroupByKey;
+import org.apache.beam.sdk.transforms.OldDoFn;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.display.DisplayData;
@@ -645,7 +645,7 @@ public class Window {
// We first apply a (trivial) transform to the input PCollection to
produce a new
// PCollection. This ensures that we don't modify the windowing
strategy of the input
// which may be used elsewhere.
- .apply("Identity", ParDo.of(new DoFn() {
+ .apply("Identity", ParDo.of(new OldDoFn() {
@Override public void processElement(ProcessContext c) {
c.output(c.element());
}
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a64baf48/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BaseExecutionContext.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BaseExecutionContext.java
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BaseExecutionContext.java
index a62444f..dd36367 100644
---
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BaseExecutionContext.java
+++
b/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BaseExecutionContext.java
@@ -107,7 +107,7 @@ public abstract class BaseExecutionContexthttp://git-wip-us.apache.org/repos/asf/incubator-beam/blob/a64baf48/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BucketingFunction.java
--
diff --git
a/sdks/java/core/src/main/java/org/apache/beam/sdk/util/BucketingFunction.java
Repository: incubator-beam
Updated Branches:
refs/heads/master 388816a80 -> 9a329aada
Port WordCount example from OldDoFn to DoFn
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/64481d0c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/64481d0c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/64481d0c
Branch: refs/heads/master
Commit: 64481d0c2ed52a075ca1f0aa9946155aa9b13119
Parents: 3bcb6f4
Author: Kenneth Knowles
Authored: Fri Jul 22 14:28:28 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 18:25:52 2016 -0700
--
.../src/main/java/org/apache/beam/examples/WordCount.java | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/64481d0c/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
--
diff --git
a/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
b/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
index 274d1ad..d3768a8 100644
--- a/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
+++ b/examples/java/src/main/java/org/apache/beam/examples/WordCount.java
@@ -26,8 +26,8 @@ import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Aggregator;
import org.apache.beam.sdk.transforms.Count;
+import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
-import org.apache.beam.sdk.transforms.OldDoFn;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
@@ -97,14 +97,14 @@ public class WordCount {
/**
* Concept #2: You can make your pipeline code less verbose by defining your
DoFns statically out-
- * of-line. This OldDoFn tokenizes lines of text into individual words; we
pass it to a ParDo in
- * the pipeline.
+ * of-line. This DoFn tokenizes lines of text into individual words; we pass
it to a ParDo in the
+ * pipeline.
*/
- static class ExtractWordsFn extends OldDoFn {
+ static class ExtractWordsFn extends DoFn {
private final Aggregator emptyLines =
createAggregator("emptyLines", new Sum.SumLongFn());
-@Override
+@ProcessElement
public void processElement(ProcessContext c) {
if (c.element().trim().isEmpty()) {
emptyLines.addValue(1L);
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/758
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406952#comment-15406952
]
Chamikara Jayalath commented on BEAM-522:
-
Actually, the bug is in the exists() implementation of gcsio.py.
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/gcsio.py#L237
Instead of catching IOError, we should be catching HttpError and checking error
code to see if it's 404.
With this fixed FileSink.finalize_write() becomes properly idempotent since we
handle failures of rename() invocation at following location.
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L533
> Update FileSink.finalize_write() to be idempotent
> -
>
> Key: BEAM-522
> URL: https://issues.apache.org/jira/browse/BEAM-522
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Currently FileSink.finelize_write() in fileio.py [1] performs following
> operations.
> (1) Obtains a list of temporary files as a side input
> (2) Renames each temporary file to the location where final output should be
> stored.
> iobase.Sink.finalize_write() operation should be idempotent since runner
> implementations may call this operation multiple times due to task failures.
> Current implementation is not idempotent because if we re-run the operation
> after renaming a sub-set of files, the operations may fail due to not being
> able to find some files at source location (for example, [2] for GCS files).
> We can fix this by checking if the destination file is already available
> before performing the rename and not performing the rename for files that are
> already available at the destination.
> [1]
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L503
> [2]
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/gcsio.py#L187
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ahmet Altay created BEAM-527:
Summary: Pickling error when pickling a nested function
Key: BEAM-527
URL: https://issues.apache.org/jira/browse/BEAM-527
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor
There is a pickling error under the following conditions all happen:
- a function is defined inside a transforms' apply method
- then using it as MapFn
- that function references an instance variable of the outer transform.
Rewriting the nested function as an unnested DoFn appears to solve the problem.
If the limitations of pickling make it difficult to support nested functions
then perhaps there's a way to make it easier for users to detect problems
caused by nested functions and recommend appropriate fixes
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles resolved BEAM-396.
--
Resolution: Fixed
Fix Version/s: 0.3.0-incubating
> Coder.NonDeterministicException doesn't inherit from Exception
> --
>
> Key: BEAM-396
> URL: https://issues.apache.org/jira/browse/BEAM-396
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Chandni Singh
>Priority: Minor
> Labels: findbugs, newbie, starter
> Fix For: 0.3.0-incubating
>
>
> [FindBugs
> NM_CLASS_NOT_EXCEPTION|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L67]:
> Class is not derived from an Exception, even though it is named as such.
> Applies to
> [Coder.NonDeterministicException|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L263]
> This is a good starter bug. When fixing, please remove the corresponding
> entries from
> [findbugs-filter.xml|https://github.com/apache/incubator-beam/blob/master/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml]
> and verify the build passes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Ahmet Altay created BEAM-526:
Summary: Mismatched pipelines give unclear error
Key: BEAM-526
URL: https://issues.apache.org/jira/browse/BEAM-526
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor
Mistakenly mixing two pipeline gives an unclear error. This is an error,
however we should improve the error message.
This could be reproduced by trying to flatten two things from different
pipelines.
Improve the message for this assert:
https://github.com/aaltay/incubator-beam/blob/python-sdk/sdks/python/apache_beam/transforms/util.py#L135
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406921#comment-15406921
]
ASF GitHub Bot commented on BEAM-396:
-
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/776
> Coder.NonDeterministicException doesn't inherit from Exception
> --
>
> Key: BEAM-396
> URL: https://issues.apache.org/jira/browse/BEAM-396
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Chandni Singh
>Priority: Minor
> Labels: findbugs, newbie, starter
>
> [FindBugs
> NM_CLASS_NOT_EXCEPTION|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L67]:
> Class is not derived from an Exception, even though it is named as such.
> Applies to
> [Coder.NonDeterministicException|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L263]
> This is a good starter bug. When fixing, please remove the corresponding
> entries from
> [findbugs-filter.xml|https://github.com/apache/incubator-beam/blob/master/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml]
> and verify the build passes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/776
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406911#comment-15406911
]
ASF GitHub Bot commented on BEAM-156:
-
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/754
> Implement Quiescence Signalling in the InProcessPipelineRunner
> --
>
> Key: BEAM-156
> URL: https://issues.apache.org/jira/browse/BEAM-156
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Thomas Groh
>
> A pipeline is quiescent when the following two properties hold:
> There are no triggers that can fire, given the current processing time and
> watermark
> All pending elements cannot make progress until a side input produces
> additional output
> This is approximately equivalent to: If no more input is received, the
> pipeline will not perform any additional processing absent advances in
> processing time or event time
> See also:
> https://docs.google.com/document/d/1fZUUbG2LxBtqCVabQshldXIhkMcXepsbv2vuuny8Ix4/edit#
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Add ProducedOutput method to TransformResult
This can communicate that a PTransform that produced no outputs still
should cause pending work to be evaluated. PCollectionViews modifiy the
state of the evaluator and can cause formerly blocked PTransforms to be
able to progress.
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/f7cc7e17
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/f7cc7e17
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/f7cc7e17
Branch: refs/heads/master
Commit: f7cc7e178db211509aecb65ba203930fd159629a
Parents: a8eb274
Author: Thomas Groh
Authored: Tue Jul 26 09:53:22 2016 -0700
Committer: Kenneth Knowles
Committed: Wed Aug 3 17:45:12 2016 -0700
--
.../beam/runners/direct/CommittedResult.java| 16 +++-
.../beam/runners/direct/EvaluationContext.java | 3 +-
.../runners/direct/StepTransformResult.java | 28 ++-
.../beam/runners/direct/TransformResult.java| 9 +++
.../runners/direct/ViewEvaluatorFactory.java| 4 +-
.../runners/direct/CommittedResultTest.java | 24 --
.../runners/direct/StepTransformResultTest.java | 85
.../runners/direct/TransformExecutorTest.java | 5 +-
.../runners/direct/WatermarkManagerTest.java| 7 +-
9 files changed, 163 insertions(+), 18 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f7cc7e17/runners/direct-java/src/main/java/org/apache/beam/runners/direct/CommittedResult.java
--
diff --git
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/CommittedResult.java
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/CommittedResult.java
index e86f07d..e9a40a8 100644
---
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/CommittedResult.java
+++
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/CommittedResult.java
@@ -20,6 +20,7 @@ package org.apache.beam.runners.direct;
import org.apache.beam.runners.direct.DirectRunner.CommittedBundle;
import org.apache.beam.sdk.transforms.AppliedPTransform;
+import org.apache.beam.sdk.transforms.View.CreatePCollectionView;
import com.google.auto.value.AutoValue;
@@ -49,12 +50,23 @@ abstract class CommittedResult {
*/
public abstract Iterable> getOutputs();
+ /**
+ * Returns if the transform that produced this result produced outputs.
+ *
+ * Transforms that produce output via modifying the state of the runner
(e.g.
+ * {@link CreatePCollectionView}) should explicitly set this to true. If
{@link #getOutputs()}
+ * returns a nonempty iterable, this will also return true.
+ */
+ public abstract boolean producedOutputs();
+
public static CommittedResult create(
TransformResult original,
CommittedBundle unprocessedElements,
- Iterable> outputs) {
+ Iterable> outputs,
+ boolean producedOutputs) {
return new AutoValue_CommittedResult(original.getTransform(),
unprocessedElements,
-outputs);
+outputs,
+producedOutputs);
}
}
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f7cc7e17/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
--
diff --git
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
index ea713fa..610a62d 100644
---
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
+++
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java
@@ -159,7 +159,8 @@ class EvaluationContext {
completedBundle == null
? null
: completedBundle.withElements((Iterable)
result.getUnprocessedElements()),
-committedBundles);
+committedBundles,
+result.producedOutput());
watermarkManager.updateWatermarks(
completedBundle,
result.getTimerUpdate().withCompletedTimers(completedTimers),
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/f7cc7e17/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StepTransformResult.java
--
diff --git
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StepTransformResult.java
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StepTransformResult.java
index 176bb14..3d6841d 100644
---
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/StepTransformResult.java
+++
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/754
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Ahmet Altay created BEAM-525:
Summary: Verify that ParDo with multiple outputs with tags un
declared in with_outputs() work
Key: BEAM-525
URL: https://issues.apache.org/jira/browse/BEAM-525
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Ahmet Altay
Priority: Minor
test_undeclared_side_outputs was failing (when last checked) under certain
conditions:
See this TODO:
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/dataflow_test.py#L202
This is probably not failing any more but it needs to be verified.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmet Altay updated BEAM-524:
-
Component/s: sdk-py
> Description of "type" argument in Aggregator is incorrect
> -
>
> Key: BEAM-524
> URL: https://issues.apache.org/jira/browse/BEAM-524
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
>Reporter: Frank Yellin
>Priority: Minor
>
> Two problems with documentation for "type" argument.
> Trivial: Remove "by default". This phrase implies that there are other
> alternatives besides what is listed. There aren't.
> Non trivial. The documentation says "types appropriate to the combine_fn"
> are accepted. I tried
> Accumulator("foo", max, datetime.datetime)
> This failed even though "datetime.datetime" is a perfectly reasonable type to
> want to take the max of. (I wanted to know precisely when the last job
> finished.)
> Either the documentation needs to be changed to specify that max/min only
> apply to numeric types, or the code needs to be changed to allow other uses
> of min and max.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/774
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/777
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Frank Yellin created BEAM-524:
-
Summary: Description of "type" argument in Aggregator is incorrect
Key: BEAM-524
URL: https://issues.apache.org/jira/browse/BEAM-524
Project: Beam
Issue Type: Bug
Reporter: Frank Yellin
Priority: Minor
Two problems with documentation for "type" argument.
Trivial: Remove "by default". This phrase implies that there are other
alternatives besides what is listed. There aren't.
Non trivial. The documentation says "types appropriate to the combine_fn" are
accepted. I tried
Accumulator("foo", max, datetime.datetime)
This failed even though "datetime.datetime" is a perfectly reasonable type to
want to take the max of. (I wanted to know precisely when the last job
finished.)
Either the documentation needs to be changed to specify that max/min only apply
to numeric types, or the code needs to be changed to allow other uses of min
and max.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Frank Yellin created BEAM-523:
-
Summary: Minor typo in aggregator_test.py
Key: BEAM-523
URL: https://issues.apache.org/jira/browse/BEAM-523
Project: Beam
Issue Type: Bug
Components: beam-model
Reporter: Frank Yellin
Assignee: Frances Perry
Priority: Trivial
aggregators is repeatedly misspelled as aggeregators.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user bjchambers opened a pull request:
https://github.com/apache/incubator-beam/pull/778
Correct some accidental renames
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [*] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [*] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [*] Replace `` in the title with the actual Jira issue
number, if there is one.
- [*] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
IDE over-eagerly replaced some occurrences of createAggregator with
createAggregatorForDoFn. This corrects that.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bjchambers/incubator-beam fix-javadoc
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/778.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #778
commit a39d682e3e2c22eb1efa597dc734d44c27113937
Author: bchambers
Date: 2016-08-03T20:38:43Z
Correct some accidental renames
IDE over-eagerly replaced some occurrences of createAggregator with
createAggregatorForDoFn. This corrects that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
GitHub user kennknowles opened a pull request:
https://github.com/apache/incubator-beam/pull/777
Bind checkstyle:check to the test-compile phase to fail earlier
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
Today we run checkstyle _after packaging_ so we have to wait for all the
various jars to be built before we fail.
This change binds checkstyle to the earliest phase that seems reasonable.
R: anyone
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kennknowles/incubator-beam checkstyle-earlier
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/777.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #777
commit 1fecb2844c0b7e6cdedb93d93164d2151c847f98
Author: Kenneth Knowles
Date: 2016-08-03T23:12:15Z
Bind checkstyle:check to the test-compile phase to fail earlier
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chandni Singh reassigned BEAM-396:
--
Assignee: Chandni Singh
> Coder.NonDeterministicException doesn't inherit from Exception
> --
>
> Key: BEAM-396
> URL: https://issues.apache.org/jira/browse/BEAM-396
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Chandni Singh
>Priority: Minor
> Labels: findbugs, newbie, starter
>
> [FindBugs
> NM_CLASS_NOT_EXCEPTION|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L67]:
> Class is not derived from an Exception, even though it is named as such.
> Applies to
> [Coder.NonDeterministicException|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L263]
> This is a good starter bug. When fixing, please remove the corresponding
> entries from
> [findbugs-filter.xml|https://github.com/apache/incubator-beam/blob/master/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml]
> and verify the build passes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406643#comment-15406643
]
ASF GitHub Bot commented on BEAM-396:
-
GitHub user chandnisingh opened a pull request:
https://github.com/apache/incubator-beam/pull/776
BEAM-396 made Coder.NonDeterministic inherit from Exception and not T…
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
…hrowable
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chandnisingh/incubator-beam BEAM-396
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/776.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #776
> Coder.NonDeterministicException doesn't inherit from Exception
> --
>
> Key: BEAM-396
> URL: https://issues.apache.org/jira/browse/BEAM-396
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
>Reporter: Scott Wegner
>Priority: Minor
> Labels: findbugs, newbie, starter
>
> [FindBugs
> NM_CLASS_NOT_EXCEPTION|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L67]:
> Class is not derived from an Exception, even though it is named as such.
> Applies to
> [Coder.NonDeterministicException|https://github.com/apache/incubator-beam/blob/58a029a06aea1030279e5da8f9fa3114f456c1db/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/Coder.java#L263]
> This is a good starter bug. When fixing, please remove the corresponding
> entries from
> [findbugs-filter.xml|https://github.com/apache/incubator-beam/blob/master/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml]
> and verify the build passes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user chandnisingh opened a pull request:
https://github.com/apache/incubator-beam/pull/776
BEAM-396 made Coder.NonDeterministic inherit from Exception and not Tâ¦
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
â¦hrowable
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/chandnisingh/incubator-beam BEAM-396
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/776.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #776
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Chamikara Jayalath created BEAM-522:
---
Summary: Update FileSink.finalize_write() to be idempotent
Key: BEAM-522
URL: https://issues.apache.org/jira/browse/BEAM-522
Project: Beam
Issue Type: Bug
Components: sdk-py
Reporter: Chamikara Jayalath
Assignee: Chamikara Jayalath
Currently FileSink.finelize_write() in fileio.py [1] performs following
operations.
(1) Obtains a list of temporary files as a side input
(2) Renames each temporary file to the location where final output should be
stored.
iobase.Sink.finalize_write() operation should be idempotent since runner
implementations may call this operation multiple times due to task failures.
Current implementation is not idempotent because if we re-run the operation
after renaming a sub-set of files, the operations may fail due to not being
able to find some files at source location (for example, [2] for GCS files).
We can fix this by checking if the destination file is already available before
performing the rename and not performing the rename for files that are already
available at the destination.
[1]
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L503
[2]
https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/gcsio.py#L187
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Repository: incubator-beam
Updated Branches:
refs/heads/python-sdk e834fa82b -> 65152cab8
Implement add_input for all CombineFns.
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/3ebf28c6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/3ebf28c6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/3ebf28c6
Branch: refs/heads/python-sdk
Commit: 3ebf28c6e0d17af3720076e33f88a0f126a89059
Parents: e834fa8
Author: Robert Bradshaw
Authored: Tue Jul 26 01:15:55 2016 -0700
Committer: Robert Bradshaw
Committed: Tue Aug 2 15:52:28 2016 -0700
--
sdks/python/apache_beam/transforms/combiners.py | 16
sdks/python/apache_beam/transforms/core.py | 6 +++---
2 files changed, 11 insertions(+), 11 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/3ebf28c6/sdks/python/apache_beam/transforms/combiners.py
--
diff --git a/sdks/python/apache_beam/transforms/combiners.py
b/sdks/python/apache_beam/transforms/combiners.py
index 155dcc6..c3f0da1 100644
--- a/sdks/python/apache_beam/transforms/combiners.py
+++ b/sdks/python/apache_beam/transforms/combiners.py
@@ -132,6 +132,9 @@ class CountCombineFn(core.CombineFn):
def create_accumulator(self):
return 0
+ def add_input(self, accumulator, element):
+return accumulator + 1
+
def add_inputs(self, accumulator, elements):
return accumulator + len(elements)
@@ -425,9 +428,9 @@ class _TupleCombineFnBase(core.CombineFn):
class TupleCombineFn(_TupleCombineFnBase):
- def add_inputs(self, accumulator, elements):
-return [c.add_inputs(a, e)
-for c, a, e in zip(self._combiners, accumulator, zip(*elements))]
+ def add_input(self, accumulator, element):
+return [c.add_input(a, e)
+for c, a, e in zip(self._combiners, accumulator, element)]
def with_common_input(self):
return SingleInputTupleCombineFn(*self._combiners)
@@ -435,8 +438,8 @@ class TupleCombineFn(_TupleCombineFnBase):
class SingleInputTupleCombineFn(_TupleCombineFnBase):
- def add_inputs(self, accumulator, elements):
-return [c.add_inputs(a, elements)
+ def add_input(self, accumulator, element):
+return [c.add_input(a, element)
for c, a in zip(self._combiners, accumulator)]
@@ -522,9 +525,6 @@ def curry_combine_fn(fn, args, kwargs):
def add_input(self, accumulator, element):
return fn.add_input(accumulator, element, *args, **kwargs)
- def add_inputs(self, accumulator, elements):
-return fn.add_inputs(accumulator, elements, *args, **kwargs)
-
def merge_accumulators(self, accumulators):
return fn.merge_accumulators(accumulators, *args, **kwargs)
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/3ebf28c6/sdks/python/apache_beam/transforms/core.py
--
diff --git a/sdks/python/apache_beam/transforms/core.py
b/sdks/python/apache_beam/transforms/core.py
index 38b9cd2..da26205 100644
--- a/sdks/python/apache_beam/transforms/core.py
+++ b/sdks/python/apache_beam/transforms/core.py
@@ -270,7 +270,7 @@ class CombineFn(WithTypeHints):
1. Input values are partitioned into one or more batches.
2. For each batch, the create_accumulator method is invoked to create a fresh
initial "accumulator" value representing the combination of zero values.
- 3. For each input value in the batch, the add_inputs method is invoked to
+ 3. For each input value in the batch, the add_input method is invoked to
combine more values with the accumulator for that batch.
4. The merge_accumulators method is invoked to combine accumulators from
separate batches into a single combined output accumulator value, once all
@@ -296,7 +296,7 @@ class CombineFn(WithTypeHints):
def add_input(self, accumulator, element, *args, **kwargs):
"""Return result of folding element into accumulator.
-CombineFn implementors must override either add_input or add_inputs.
+CombineFn implementors must override add_input.
Args:
accumulator: the current accumulator
@@ -420,7 +420,7 @@ class CallableWrapperCombineFn(CombineFn):
if accumulator is self._EMPTY:
return self._fn(elements, *args, **kwargs)
elif isinstance(elements, (list, tuple)):
- return self._fn([accumulator] + elements, *args, **kwargs)
+ return self._fn([accumulator] + list(elements), *args, **kwargs)
else:
def union():
yield accumulator
Document TupleCombineFns
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/4a2239d3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/4a2239d3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/4a2239d3
Branch: refs/heads/python-sdk
Commit: 4a2239d3701e13622998c71107d263c8966e73e1
Parents: 3ebf28c
Author: Robert Bradshaw
Authored: Wed Aug 3 13:52:36 2016 -0700
Committer: Robert Bradshaw
Committed: Wed Aug 3 13:52:36 2016 -0700
--
sdks/python/apache_beam/transforms/combiners.py | 12
1 file changed, 12 insertions(+)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/4a2239d3/sdks/python/apache_beam/transforms/combiners.py
--
diff --git a/sdks/python/apache_beam/transforms/combiners.py
b/sdks/python/apache_beam/transforms/combiners.py
index c3f0da1..a0604b8 100644
--- a/sdks/python/apache_beam/transforms/combiners.py
+++ b/sdks/python/apache_beam/transforms/combiners.py
@@ -427,6 +427,12 @@ class _TupleCombineFnBase(core.CombineFn):
class TupleCombineFn(_TupleCombineFnBase):
+ """A combiner for combining tuples via a tuple of combiners.
+
+ Takes as input a tuple of N CombineFns and combines N-tuples by
+ combining the k-th element of each tuple with the k-th CombineFn,
+ outputting a new N-tuple of combined values.
+ """
def add_input(self, accumulator, element):
return [c.add_input(a, e)
@@ -437,6 +443,12 @@ class TupleCombineFn(_TupleCombineFnBase):
class SingleInputTupleCombineFn(_TupleCombineFnBase):
+ """A combiner for combining a single value via a tuple of combiners.
+
+ Takes as input a tuple of N CombineFns and combines elements by
+ applying each CombineFn to each input, producing an N-tuple of
+ the outputs corresponding to each of the N CombineFn's outputs.
+ """
def add_input(self, accumulator, element):
return [c.add_input(a, element)
Update to today
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo
Commit:
http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/045e0040
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/045e0040
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/045e0040
Branch: refs/heads/asf-site
Commit: 045e0040d1c82eac90661b45d37f136cf6f6a08c
Parents: 4244fe2
Author: Dan Halperin
Authored: Wed Aug 3 12:00:43 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 12:00:43 2016 -0700
--
_posts/2016-08-03-six-months.md | 43
_posts/2016-08-04-six-months.md | 43
2 files changed, 43 insertions(+), 43 deletions(-)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/045e0040/_posts/2016-08-03-six-months.md
--
diff --git a/_posts/2016-08-03-six-months.md b/_posts/2016-08-03-six-months.md
new file mode 100644
index 000..1141eaf
--- /dev/null
+++ b/_posts/2016-08-03-six-months.md
@@ -0,0 +1,43 @@
+---
+layout: post
+title: "Apache Beam: Six Months in Incubation"
+date: 2016-08-03 00:00:01 -0700
+excerpt_separator:
+categories: blog
+authors:
+ - fjp
+---
+
+Itâs been just over six months since Apache Beam was formally accepted into
incubation with the [Apache Software Foundation](http://www.apache.org). As a
community, weâve been hard at work getting Beam off the ground.
+
+
+
+Looking just at raw numbers for those first six months, thatâs:
+
+* 48,238 lines of preexisting code donated by Cloudera, dataArtisans, and
Google.
+* 761 pull requests from 45 contributors.
+* 498 Jira issues opened and 245 resolved.
+* 1 incubating release (and another 1 in progress).
+* 4,200 hours of automated tests.
+* 161 subscribers / 606 messages on user@.
+* 217 subscribers / 1205 messages on dev@.
+* 277 stars and 174 forks on GitHub.
+
+And behind those numbers, thereâs been a ton of technical progress,
including:
+
+* Refactoring of the entire codebase, examples, and tests to be truly
runner-independent.
+* New functionality in the Apache Flink runner for timestamps/windows in batch
and bounded sources and side inputs in streaming mode.
+* Work in progress to upgrade the Apache Spark runner to use Spark 2.0.
+* Several new runners from the wider Apache community -- Apache Gearpump has
its own feature branch, Apache Apex has a PR, and conversations are starting on
Apache Storm and others.
+* New SDKs/DSLs for exposing the Beam model -- the Python SDK from Google is
in on a feature branch, and there are plans to add the Scio DSL from Spotify.
+* Support for additional data sources and sinks -- Apache Kafka and JMS are
in, there are PRs for Amazon Kinesis, Apache Cassandra, and MongoDB, and more
connectors are being planned.
+
+But perhaps most importantly, weâre committed to building an involved,
welcoming community. So far, weâve:
+
+* Started building a vibrant developer community, with detailed design
discussions on features like DoFn reuse semantics, serialization technology,
and an API for accessing state.
+* Started building a user community with an active mailing list and
improvements to the website and documentation.
+* Had multiple talks on Beam at venues including ApacheCon, Hadoop Summit,
Kafka Summit, JBCN Barcelona, and Strata.
+* Presented at multiple existing meetups and are starting to organize some of
our own.
+
+While itâs nice to reflect back on all weâve done, weâre working full
_stream_ ahead towards a stable release and graduation from incubator. And
weâd love your help -- join the [mailing
lists](http://beam.incubator.apache.org/use/mailing-lists/), check out the
[contribution
guide](http://beam.incubator.apache.org/contribute/contribution-guide/), and
grab a [starter
task](https://issues.apache.org/jira/browse/BEAM-520?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20in%20(newbie%2C%20starter))
from Jira!
+
http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/045e0040/_posts/2016-08-04-six-months.md
--
diff --git a/_posts/2016-08-04-six-months.md b/_posts/2016-08-04-six-months.md
deleted file mode 100644
index c4ed246..000
--- a/_posts/2016-08-04-six-months.md
+++ /dev/null
@@ -1,43 +0,0 @@
-layout: post
-title: "Apache Beam: Six Months in Incubation"
-date: 2016-08-04 00:00:01 -0700
-excerpt_separator:
-categories: blog
-authors:
- - fjp
-
-Itâs been just over six months since Apache Beam was formally accepted into
incubation with the [Apache Software Foundation](http://www.apache.org). As a
community, weâve been
Repository: incubator-beam-site
Updated Branches:
refs/heads/asf-site c630ee0ed -> 342fe7e42
Added a half birthday blog post, as discussed on dev@.
Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo
Commit:
http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/4244fe2b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/4244fe2b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/4244fe2b
Branch: refs/heads/asf-site
Commit: 4244fe2b9638b973d99ab73bad9eace7f79cb070
Parents: c630ee0
Author: Frances Perry
Authored: Tue Aug 2 22:36:07 2016 -0700
Committer: Dan Halperin
Committed: Wed Aug 3 11:59:04 2016 -0700
--
_posts/2016-08-04-six-months.md | 43
1 file changed, 43 insertions(+)
--
http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/4244fe2b/_posts/2016-08-04-six-months.md
--
diff --git a/_posts/2016-08-04-six-months.md b/_posts/2016-08-04-six-months.md
new file mode 100644
index 000..c4ed246
--- /dev/null
+++ b/_posts/2016-08-04-six-months.md
@@ -0,0 +1,43 @@
+---
+layout: post
+title: "Apache Beam: Six Months in Incubation"
+date: 2016-08-04 00:00:01 -0700
+excerpt_separator:
+categories: blog
+authors:
+ - fjp
+---
+
+Itâs been just over six months since Apache Beam was formally accepted into
incubation with the [Apache Software Foundation](http://www.apache.org). As a
community, weâve been hard at work getting Beam off the ground.
+
+
+
+Looking just at raw numbers for those first six months, thatâs:
+
+* 48,238 lines of preexisting code donated by Cloudera, dataArtisans, and
Google.
+* 761 pull requests from 45 contributors.
+* 498 Jira issues opened and 245 resolved.
+* 1 incubating release (and another 1 in progress).
+* 4,200 hours of automated tests.
+* 161 subscribers / 606 messages on user@.
+* 217 subscribers / 1205 messages on dev@.
+* 277 stars and 174 forks on GitHub.
+
+And behind those numbers, thereâs been a ton of technical progress,
including:
+
+* Refactoring of the entire codebase, examples, and tests to be truly
runner-independent.
+* New functionality in the Apache Flink runner for timestamps/windows in batch
and bounded sources and side inputs in streaming mode.
+* Work in progress to upgrade the Apache Spark runner to use Spark 2.0.
+* Several new runners from the wider Apache community -- Apache Gearpump has
its own feature branch, Apache Apex has a PR, and conversations are starting on
Apache Storm and others.
+* New SDKs/DSLs for exposing the Beam model -- the Python SDK from Google is
in on a feature branch, and there are plans to add the Scio DSL from Spotify.
+* Support for additional data sources and sinks -- Apache Kafka and JMS are
in, there are PRs for Amazon Kinesis, Apache Cassandra, and MongoDB, and more
connectors are being planned.
+
+But perhaps most importantly, weâre committed to building an involved,
welcoming community. So far, weâve:
+
+* Started building a vibrant developer community, with detailed design
discussions on features like DoFn reuse semantics, serialization technology,
and an API for accessing state.
+* Started building a user community with an active mailing list and
improvements to the website and documentation.
+* Had multiple talks on Beam at venues including ApacheCon, Hadoop Summit,
Kafka Summit, JBCN Barcelona, and Strata.
+* Presented at multiple existing meetups and are starting to organize some of
our own.
+
+While itâs nice to reflect back on all weâve done, weâre working full
_stream_ ahead towards a stable release and graduation from incubator. And
weâd love your help -- join the [mailing
lists](http://beam.incubator.apache.org/use/mailing-lists/), check out the
[contribution
guide](http://beam.incubator.apache.org/contribute/contribution-guide/), and
grab a [starter
task](https://issues.apache.org/jira/browse/BEAM-520?jql=project%20%3D%20BEAM%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20in%20(newbie%2C%20starter))
from Jira!
+
[
https://issues.apache.org/jira/browse/BEAM-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406558#comment-15406558
]
Ismaël Mejía commented on BEAM-516:
---
Just for reference I include a link to a previous (and related) issue related
to the javadocs:
https://issues.apache.org/jira/browse/BEAM-385
I think that having the javadocs as part of the CI systems (like flink does on
the mentioned JIRA) brings some advantages, e.g. up to date docs for master, as
well as an independent from the website management (to not to make some of the
changes by hand).
> Update navigation for Javadoc
> --
>
> Key: BEAM-516
> URL: https://issues.apache.org/jira/browse/BEAM-516
> Project: Beam
> Issue Type: Bug
> Components: website
>Reporter: Ismaël Mejía
>Assignee: James Malone
>Priority: Minor
> Attachments: screenshot.png
>
>
> The link to the latest version of the java documentation dissapeared with the
> recent changes to the website.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chamikara Jayalath resolved BEAM-499.
-
Resolution: Fixed
Fix Version/s: Not applicable
Removed unused code from apiclient.py.
Seems like iobase.py currently cannot be further trimmed down.
> Remove unused code in apiclient.py and iobase.py
>
>
> Key: BEAM-499
> URL: https://issues.apache.org/jira/browse/BEAM-499
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> There is some code in apiclient.py and iobase.py that is not used by Dataflow
> SDK. This code has to be removed.
> E.g.:
> class DataflowWorkerClient
> def reader_progress_to_cloud_progress() and other similar methods.
> def splits_to_split_response()
> class ConcatPosition
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chamikara Jayalath resolved BEAM-502.
-
Resolution: Fixed
Fix Version/s: Not applicable
> Properly handle None/null in json conversions
> -
>
> Key: BEAM-502
> URL: https://issues.apache.org/jira/browse/BEAM-502
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
> Fix For: Not applicable
>
>
> json_value.py has to be updated to properly handle JSON to/from Python
> 'None' conversions.
> For example, currently writing a dictionary of the form {'aa': 'value',
> 'bb':None} using BigQuery sink fails when using DirectPipelineRunner since we
> do not properly handle 'None' values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Eugene Kirpichov created BEAM-521:
-
Summary: Execute some file-based reads via composite transform
instead of custom source
Key: BEAM-521
URL: https://issues.apache.org/jira/browse/BEAM-521
Project: Beam
Issue Type: Improvement
Reporter: Eugene Kirpichov
The BoundedSource API is intended for cases where the source can provide
meaningfull progress, dynamic splitting and size estimation. E.g. it's a good
fit for processing a moderate number of large files, or a key-value table.
However, existing runners have scalability limitations on how many bundles a
BoundedSource can split into, and this leads to it being a very poor fit for
the case of processing many small files: the source ends up splitting in a too
large number of bundles (at least 1 per file) overwhelming the runner.
This is a frequent use case, and the power of BoundedSource API is not needed
in this case: small files don't need to be dynamically split, progress
estimation is not needed, and size estimation is a "nice-to-have" but not
entirely necessary.
In this case, it'd be better to execute the read not as a raw
Read.from(BoundedSource) executed natively by the runner, but as a
ParDo(splitIntoBundles) + fusion break + ParDo(read each bundle). That way the
bundles end up as a simple PCollection with no scalability limitations, and
most likely much smaller per-bundle overhead.
Implementation options:
- The BoundedSource API could provide a hint method telling Read.from() to
expand in this way
- Individual connectors, such as TextIO.Read, could switch between expanding
into Read.from() or into this composite transform depending on parameters (e.g.
TextIO.Read.withCompressionType(GZ) would always expand into the composite
transform, because for compressed files BoundedSource API is unnecessary)
- Something else?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Kirpichov updated BEAM-521:
--
Summary: Execute some bounded source reads via composite transform (was:
Execute some file-based reads via composite transform instead of custom source)
> Execute some bounded source reads via composite transform
> -
>
> Key: BEAM-521
> URL: https://issues.apache.org/jira/browse/BEAM-521
> Project: Beam
> Issue Type: Improvement
>Reporter: Eugene Kirpichov
>
> The BoundedSource API is intended for cases where the source can provide
> meaningfull progress, dynamic splitting and size estimation. E.g. it's a good
> fit for processing a moderate number of large files, or a key-value table.
> However, existing runners have scalability limitations on how many bundles a
> BoundedSource can split into, and this leads to it being a very poor fit for
> the case of processing many small files: the source ends up splitting in a
> too large number of bundles (at least 1 per file) overwhelming the runner.
> This is a frequent use case, and the power of BoundedSource API is not needed
> in this case: small files don't need to be dynamically split, progress
> estimation is not needed, and size estimation is a "nice-to-have" but not
> entirely necessary.
> In this case, it'd be better to execute the read not as a raw
> Read.from(BoundedSource) executed natively by the runner, but as a
> ParDo(splitIntoBundles) + fusion break + ParDo(read each bundle). That way
> the bundles end up as a simple PCollection with no scalability limitations,
> and most likely much smaller per-bundle overhead.
> Implementation options:
> - The BoundedSource API could provide a hint method telling Read.from() to
> expand in this way
> - Individual connectors, such as TextIO.Read, could switch between expanding
> into Read.from() or into this composite transform depending on parameters
> (e.g. TextIO.Read.withCompressionType(GZ) would always expand into the
> composite transform, because for compressed files BoundedSource API is
> unnecessary)
> - Something else?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
GitHub user iemejia opened a pull request:
https://github.com/apache/incubator-beam/pull/775
New
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-] Description of pull request`
- [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [ ] Replace `` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.txt).
---
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/iemejia/incubator-beam new
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/775.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #775
commit c4a2441e933dd49a5dee5a7295e29f43cd403de2
Author: Ismaël MejÃa
Date: 2016-08-03T09:03:43Z
Remove useless semicolons
commit ea8e4e7415dcbe0b8939de788d2713b28c5a511e
Author: Ismaël MejÃa
Date: 2016-08-03T09:22:40Z
Remove unneeded java keywords/validations and fix Filter style
commit 2e21b8b9b013a1f112ffb73833432b9b356ffa8b
Author: Ismaël MejÃa
Date: 2016-08-03T09:37:37Z
Fix invalid Javadoc references and some other documentation issues
commit 72ffd759e3c29d83e1a880bc499482ccfb603f70
Author: Ismaël MejÃa
Date: 2016-08-03T10:10:39Z
Add rules for unused semicolons and overcomplicated boolean expresions
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
GitHub user dhalperi opened a pull request:
https://github.com/apache/incubator-beam/pull/774
Run integration tests in parallel
WIP
CC: @kennknowles
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dhalperi/incubator-beam
integration-test-parallel
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/774.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #774
commit 3d2ecbaaa90bb6910096423ac87e882f5d649e9e
Author: Dan Halperin
Date: 2016-08-03T06:51:45Z
Run integration tests in parallel
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam/pull/773
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
[
https://issues.apache.org/jira/browse/BEAM-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Halperin resolved BEAM-514.
--
Resolution: Fixed
Fix Version/s: Not applicable
> Add all mandatory links
> ---
>
> Key: BEAM-514
> URL: https://issues.apache.org/jira/browse/BEAM-514
> Project: Beam
> Issue Type: Bug
> Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Frances Perry
> Fix For: Not applicable
>
>
> Except from:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E
> > Branding wise I think you are missing a few of the
> required links [3] including a link back to the Apache homepage.
> http://www.apache.org/foundation/marks/pmcs.html#navigation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Halperin closed BEAM-514.
> Add all mandatory links
> ---
>
> Key: BEAM-514
> URL: https://issues.apache.org/jira/browse/BEAM-514
> Project: Beam
> Issue Type: Bug
> Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Frances Perry
> Fix For: Not applicable
>
>
> Except from:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E
> > Branding wise I think you are missing a few of the
> required links [3] including a link back to the Apache homepage.
> http://www.apache.org/foundation/marks/pmcs.html#navigation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Halperin closed BEAM-515.
Resolution: Fixed
Fix Version/s: Not applicable
> Add feature logo and incubator logo
> ---
>
> Key: BEAM-515
> URL: https://issues.apache.org/jira/browse/BEAM-515
> Project: Beam
> Issue Type: Bug
> Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Frances Perry
>Priority: Critical
> Fix For: Not applicable
>
>
> Except from:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E
> A feather ASF logo would be a nice addition as well. [4]
> http://www.apache.org/foundation/press/kit/#links
> While we're in there, I believe we still need to add the Apache Incubator egg
> logo. http://incubator.apache.org/images/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Halperin closed BEAM-369.
Resolution: Fixed
Fix Version/s: Not applicable
> Add travis config to website
>
>
> Key: BEAM-369
> URL: https://issues.apache.org/jira/browse/BEAM-369
> Project: Beam
> Issue Type: Bug
> Components: website
>Reporter: James Malone
>Assignee: James Malone
> Fix For: Not applicable
>
>
> Add a Travis CI config to the Beam website.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
[
https://issues.apache.org/jira/browse/BEAM-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405394#comment-15405394
]
ASF GitHub Bot commented on BEAM-514:
-
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam-site/pull/33
> Add all mandatory links
> ---
>
> Key: BEAM-514
> URL: https://issues.apache.org/jira/browse/BEAM-514
> Project: Beam
> Issue Type: Bug
> Components: website
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Frances Perry
>
> Except from:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201608.mbox/%3C7E0226B1-0386-499C-8473-61A8E51A691B%40classsoftware.com%3E
> > Branding wise I think you are missing a few of the
> required links [3] including a link back to the Apache homepage.
> http://www.apache.org/foundation/marks/pmcs.html#navigation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-beam-site/pull/33
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---