[GitHub] incubator-beam pull request: [BEAM-34][BEAM-145] Make WindowingStr...
GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/296 [BEAM-34][BEAM-145] Make WindowingStrategy combine WindowFn with OutputTimeFn Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Previously: - Any user-specified OutputTimeFn overrode the WindowFn#getOutputTime - WindowFn#getOutputTimeFn provided a default OutputTimeFn - The default varied from "earliest" to "end of window" Now: - The user-specified OutputTimeFn is used to combine the WindowFn's assigned output timestamps. - The WindowFn does not provide the default. - The default is always to output at end of window. For each of the tests that this effects, I had a choice: either update the timestamps in the test to be the end of window, or explicitly reset the windowing strategy to choose the minimum timestamp. The latter generally gets more useful coverage, since the latter is fairly trivial, so I generally favored it. It is also easier to migrate to. And most of the tests are overspecified anyhow and should not be examining the timestamps. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam OutputAtEndOfWindow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/296.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #296 commit 3aec78e34c1f1a1045d091048d3fa018a7cc0d3d Author: Kenneth KnowlesDate: 2016-05-06T02:33:16Z Make WindowingStrategy combine WindowFn with OutputTimeFn Previously: - Any user-specified OutputTimeFn overrode the WindowFn#getOutputTime - WindowFn#getOutputTimeFn provided a default OutputTimeFn - The default varied from "earliest" to "end of window" Now: - The user-specified OutputTimeFn is used to combine the WindowFn's assigned output timestamps. - The WindowFn does not provide the default. - The default is always to output at end of window. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (BEAM-177) Integrate code coverage to build and review process
[ https://issues.apache.org/jira/browse/BEAM-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-177: - Assignee: (was: Kenneth Knowles) > Integrate code coverage to build and review process > --- > > Key: BEAM-177 > URL: https://issues.apache.org/jira/browse/BEAM-177 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kenneth Knowles > > We cannot use codecov, but we can use coveralls. We have the maven plugin > included in the pom and need to invoke it appropriately in our various > builds, and disseminate knowledge about browser extensions to get it into the > pull request UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-157) CombineTest.testGlobalCombineWithDefaultsAndTriggers is broken
[ https://issues.apache.org/jira/browse/BEAM-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-157. -- Resolution: Fixed > CombineTest.testGlobalCombineWithDefaultsAndTriggers is broken > -- > > Key: BEAM-157 > URL: https://issues.apache.org/jira/browse/BEAM-157 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Daniel Halperin >Assignee: Kenneth Knowles >Priority: Critical > > The test is not run because `p.run()` is not called. When `p.run()` is added, > the test fails. > Kenn, I suspect this is because it's using triggers in batch, which obviously > is not guaranteed to work. > Please investigate! > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/transforms/CombineTest.java#L373 > Failed tests: > CombineTest.testGlobalCombineWithDefaultsAndTriggers:391 > Expected: iterable over ["2: true", "1: false"] in any order > but: No item matches: "1: false" in ["2: true"] > Tests run: 30, Failures: 1, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-82) Transitive dependencies in Beam pom.xml have conflicts that could result in old versions if reordered
[ https://issues.apache.org/jira/browse/BEAM-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-82. - Resolution: Fixed > Transitive dependencies in Beam pom.xml have conflicts that could result in > old versions if reordered > - > > Key: BEAM-82 > URL: https://issues.apache.org/jira/browse/BEAM-82 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: pom.xml > > Specifically, com.google.apis:google-api-services-datastore-protobuf depends > on a very old version of the suite of Google API libraries. It is by maven's > dependency resolution that these are generally overridden by other > dependencies on the new versions. > It is easy (and I have done it) to get things rearranged so that it pulls in > the very old API clients. They should be suppressed and provided in their > latest form for compatibility with the rest of the SDK that is up to date. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-259) Execute selected RunnableOnService tests with Spark runner
[ https://issues.apache.org/jira/browse/BEAM-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273223#comment-15273223 ] ASF GitHub Bot commented on BEAM-259: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/294 [BEAM-259] Configure RunnableOnService tests for Spark runner, batch mode Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This PR demonstrates how to configure the integration tests. It has two categories of issue: 1. Transforms that are not supported. For these we can add surefire exclusions for now. 2. Runtime errors having to do with configuring Spark. I'm hoping someone with expertise in the runner can take quickly recommend the course of action. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam spark-integration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/294.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #294 commit d8a2a34f723ff4ca7fe841c8056706c19d37770d Author: Kenneth KnowlesDate: 2016-05-05T22:11:07Z Configure RunnableOnService tests for Spark runner, batch mode > Execute selected RunnableOnService tests with Spark runner > -- > > Key: BEAM-259 > URL: https://issues.apache.org/jira/browse/BEAM-259 > Project: Beam > Issue Type: Test >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-192) Create new landing page for Apache Beam Documentation
[ https://issues.apache.org/jira/browse/BEAM-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273099#comment-15273099 ] Devin Donnelly commented on BEAM-192: - The following pull request on incubator-beam-site addresses this issue: https://github.com/apache/incubator-beam-site/pull/14 > Create new landing page for Apache Beam Documentation > - > > Key: BEAM-192 > URL: https://issues.apache.org/jira/browse/BEAM-192 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Devin Donnelly > > Revise the current stopgap Apache Beam landing page. > - Explain the benefits of the Beam programming model > - Disclose the status of the various Beam SDKs and runners > - Provide an easy place to access release notes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272951#comment-15272951 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/282 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[2/2] incubator-beam git commit: This closes #282
This closes #282 Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/51e1e59b Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/51e1e59b Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/51e1e59b Branch: refs/heads/master Commit: 51e1e59b8988d7caf3b924378e2ff95037fca4d3 Parents: e63311f 2adf45f Author: Kenneth KnowlesAuthored: Thu May 5 12:54:08 2016 -0700 Committer: Kenneth Knowles Committed: Thu May 5 12:54:08 2016 -0700 -- .../direct/ExecutorServiceParallelExecutor.java | 51 1 file changed, 30 insertions(+), 21 deletions(-) --
[GitHub] incubator-beam pull request: Remove Dataflow runner references in ...
GitHub user peihe opened a pull request: https://github.com/apache/incubator-beam/pull/293 Remove Dataflow runner references in WordCount examples. You can merge this pull request into a Git repository by running: $ git pull https://github.com/peihe/incubator-beam fix-minimum-wc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/293.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #293 commit 29e3ef6284e78dacecd6fef0565649d16fc66303 Author: Pei HeDate: 2016-05-02T19:49:35Z Remove Dataflow runner references in WordCount examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (BEAM-115) Beam Runner API
[ https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272941#comment-15272941 ] ASF GitHub Bot commented on BEAM-115: - Github user kennknowles closed the pull request at: https://github.com/apache/incubator-beam/pull/277 > Beam Runner API > --- > > Key: BEAM-115 > URL: https://issues.apache.org/jira/browse/BEAM-115 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > > The PipelineRunner API from the SDK is not ideal for the Beam technical > vision. > It has technical limitations: > - The user's DAG (even including library expansions) is never explicitly > represented, so it cannot be analyzed except incrementally, and cannot > necessarily be reconstructed (for example, to display it!). > - The flattened DAG of just primitive transforms isn't well-suited for > display or transform override. > - The TransformHierarchy isn't well-suited for optimizations. > - The user must realistically pre-commit to a runner, and its configuration > (batch vs streaming) prior to graph construction, since the runner will be > modifying the graph as it is built. > - It is fairly language- and SDK-specific. > It has usability issues (these are not from intuition, but derived from > actual cases of failure to use according to the design) > - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner > is confusing. > - The TransformHierarchy, accessible only via visitor traversals, is > cumbersome. > - The staging of construction-time vs run-time is not always obvious. > These are just examples. This ticket tracks designing, coming to consensus, > and building an API that more simply and directly supports the technical > vision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-beam pull request: [BEAM-115] Port batch Flink GroupByKe...
Github user kennknowles closed the pull request at: https://github.com/apache/incubator-beam/pull/277 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-beam pull request: Add wildcard to checkstyle ordering
GitHub user sammcveety opened a pull request: https://github.com/apache/incubator-beam/pull/292 Add wildcard to checkstyle ordering Add wildcard to checkstyle, to handle unexpected package prefixes. These should still be ordered before the sun and java packages. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sammcveety/incubator-beam patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit e31f1a696475ea110d3d3b3dca382b64d12c582b Author: sammcveetyDate: 2016-05-05T19:15:24Z Add wildcard to checkstyle ordering Add wildcard to checkstyle, to handle unexpected package prefixes. These should still be ordered before the sun and java packages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (BEAM-258) Execute selected RunnableOnService tests with Flink runner
[ https://issues.apache.org/jira/browse/BEAM-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272882#comment-15272882 ] ASF GitHub Bot commented on BEAM-258: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/291 [BEAM-258] Configure RunnableOnService tests for Flink runner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This is a sample configuration for now. There are these kind of failures in the tests right now: 1. Since the batch runner only supports global windows, I've filtered those tests out. I added the `UnsupportedOperationException` to the windowing translator so I could distinguish them. 2. Those tests that have simple & supported pipelines succeed at building the pipeline, but somehow the graph is empty - I have checked and it seemed like translators are not even being invoked. This is beyond my current scope of digging in. 3. Some other tests fail with other misc errors. Perhaps they use state, which results in `NullPointerException`. 4. Pretty much all of the tests require side inputs since that is how `PAssert` works, so they cannot work in streaming mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam flink-integration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #291 commit 52198eb7e9c2df627871e7f96404d188dc4a4ee7 Author: Kenneth KnowlesDate: 2016-05-02T21:29:30Z Add Window.Bound translator to Flink batch This adds a Window.Bound translator that allows only GlobalWindows. It is a temporary measure, but one that brings the Flink batch translator in line with the Beam model - instead of "ignoring" windows, the GBK is a perfectly valid GBK for GlobalWindows. Previously, the SDK's runner test suite would fail due to the lack of a translator - now some of them will fail due to windowing support, but others have a chance. commit 095f9840dd0d8d78f041e494613375664f7d3eaa Author: Kenneth Knowles Date: 2016-05-02T20:11:12Z Add TestFlinkPipelineRunner to FlinkRunnerRegistrar This makes the runner available for selection by integration tests. commit 750a49d286f4d11d6ad63460d8b244a5ebde975e Author: Kenneth Knowles Date: 2016-05-02T21:04:20Z Configure RunnableOnService tests for Flink in batch mode Today Flink batch supports only global windows. This is a situation we intend our build to allow, eventually via JUnit category filtering. For now all the test classes that use non-global windows are excluded entirely via maven configuration. In the future, it should be on a per-test-method basis. > Execute selected RunnableOnService tests with Flink runner > -- > > Key: BEAM-258 > URL: https://issues.apache.org/jira/browse/BEAM-258 > Project: Beam > Issue Type: Test > Components: runner-flink >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-258) Execute selected RunnableOnService tests with Flink runner
Kenneth Knowles created BEAM-258: Summary: Execute selected RunnableOnService tests with Flink runner Key: BEAM-258 URL: https://issues.apache.org/jira/browse/BEAM-258 Project: Beam Issue Type: Test Components: runner-flink Reporter: Kenneth Knowles Assignee: Kenneth Knowles -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-52) KafkaIO - bounded/unbounded, source/sink
[ https://issues.apache.org/jira/browse/BEAM-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272649#comment-15272649 ] Raghu Angadi commented on BEAM-52: -- Kafka Sink PR : https://github.com/apache/incubator-beam/pull/271 > KafkaIO - bounded/unbounded, source/sink > > > Key: BEAM-52 > URL: https://issues.apache.org/jira/browse/BEAM-52 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions >Reporter: Daniel Halperin >Assignee: Raghu Angadi > > We should support Apache Kafka. The priority list is probably: > * UnboundedSource > * unbounded Sink > * BoundedSource > * bounded Sink > The connector should be well-tested, especially around UnboundedSource > checkpointing and resuming, and data duplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)