[GitHub] incubator-beam pull request: [BEAM-34][BEAM-145] Make WindowingStr...

2016-05-05 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/296

[BEAM-34][BEAM-145] Make WindowingStrategy combine WindowFn with 
OutputTimeFn

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Previously:

 - Any user-specified OutputTimeFn overrode the WindowFn#getOutputTime
 - WindowFn#getOutputTimeFn provided a default OutputTimeFn
 - The default varied from "earliest" to "end of window"

Now:

 - The user-specified OutputTimeFn is used to combine the WindowFn's
   assigned output timestamps.
 - The WindowFn does not provide the default.
 - The default is always to output at end of window.

For each of the tests that this effects, I had a choice: either update the 
timestamps in the test to be the end of window, or explicitly reset the 
windowing strategy to choose the minimum timestamp. The latter generally gets 
more useful coverage, since the latter is fairly trivial, so I generally 
favored it. It is also easier to migrate to. And most of the tests are 
overspecified anyhow and should not be examining the timestamps.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam OutputAtEndOfWindow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #296


commit 3aec78e34c1f1a1045d091048d3fa018a7cc0d3d
Author: Kenneth Knowles 
Date:   2016-05-06T02:33:16Z

Make WindowingStrategy combine WindowFn with OutputTimeFn

Previously:

 - Any user-specified OutputTimeFn overrode the WindowFn#getOutputTime
 - WindowFn#getOutputTimeFn provided a default OutputTimeFn
 - The default varied from "earliest" to "end of window"

Now:

 - The user-specified OutputTimeFn is used to combine the WindowFn's
   assigned output timestamps.
 - The WindowFn does not provide the default.
 - The default is always to output at end of window.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (BEAM-177) Integrate code coverage to build and review process

2016-05-05 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-177:
-
Assignee: (was: Kenneth Knowles)

> Integrate code coverage to build and review process
> ---
>
> Key: BEAM-177
> URL: https://issues.apache.org/jira/browse/BEAM-177
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>
> We cannot use codecov, but we can use coveralls. We have the maven plugin 
> included in the pom and need to invoke it appropriately in our various 
> builds, and disseminate knowledge about browser extensions to get it into the 
> pull request UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-157) CombineTest.testGlobalCombineWithDefaultsAndTriggers is broken

2016-05-05 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-157.
--
Resolution: Fixed

> CombineTest.testGlobalCombineWithDefaultsAndTriggers is broken
> --
>
> Key: BEAM-157
> URL: https://issues.apache.org/jira/browse/BEAM-157
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Kenneth Knowles
>Priority: Critical
>
> The test is not run because `p.run()` is not called. When `p.run()` is added, 
> the test fails.
> Kenn, I suspect this is because it's using triggers in batch, which obviously 
> is not guaranteed to work.
> Please investigate!
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/test/java/com/google/cloud/dataflow/sdk/transforms/CombineTest.java#L373
> Failed tests: 
>   CombineTest.testGlobalCombineWithDefaultsAndTriggers:391 
> Expected: iterable over ["2: true", "1: false"] in any order
>  but: No item matches: "1: false" in ["2: true"]
> Tests run: 30, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-82) Transitive dependencies in Beam pom.xml have conflicts that could result in old versions if reordered

2016-05-05 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-82.
-
Resolution: Fixed

> Transitive dependencies in Beam pom.xml have conflicts that could result in 
> old versions if reordered
> -
>
> Key: BEAM-82
> URL: https://issues.apache.org/jira/browse/BEAM-82
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: pom.xml
>
> Specifically, com.google.apis:google-api-services-datastore-protobuf depends 
> on a very old version of the suite of Google API libraries. It is by maven's 
> dependency resolution that these are generally overridden by other 
> dependencies on the new versions.
> It is easy (and I have done it) to get things rearranged so that it pulls in 
> the very old API clients. They should be suppressed and provided in their 
> latest form for compatibility with the rest of the SDK that is up to date.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-259) Execute selected RunnableOnService tests with Spark runner

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273223#comment-15273223
 ] 

ASF GitHub Bot commented on BEAM-259:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/294

[BEAM-259] Configure RunnableOnService tests for Spark runner, batch mode

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This PR demonstrates how to configure the integration tests. It has two 
categories of issue:

1. Transforms that are not supported. For these we can add surefire 
exclusions for now.
2. Runtime errors having to do with configuring Spark. I'm hoping someone 
with expertise in the runner can take quickly recommend the course of action.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam spark-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #294


commit d8a2a34f723ff4ca7fe841c8056706c19d37770d
Author: Kenneth Knowles 
Date:   2016-05-05T22:11:07Z

Configure RunnableOnService tests for Spark runner, batch mode




> Execute selected RunnableOnService tests with Spark runner
> --
>
> Key: BEAM-259
> URL: https://issues.apache.org/jira/browse/BEAM-259
> Project: Beam
>  Issue Type: Test
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-192) Create new landing page for Apache Beam Documentation

2016-05-05 Thread Devin Donnelly (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273099#comment-15273099
 ] 

Devin Donnelly commented on BEAM-192:
-

The following pull request on incubator-beam-site addresses this issue:
https://github.com/apache/incubator-beam-site/pull/14

> Create new landing page for Apache Beam Documentation
> -
>
> Key: BEAM-192
> URL: https://issues.apache.org/jira/browse/BEAM-192
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Devin Donnelly
>Assignee: Devin Donnelly
>
> Revise the current stopgap Apache Beam landing page.
> - Explain the benefits of the Beam programming model
> - Disclose the status of the various Beam SDKs and runners
> - Provide an easy place to access release notes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272951#comment-15272951
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/282


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: This closes #282

2016-05-05 Thread kenn
This closes #282


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/51e1e59b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/51e1e59b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/51e1e59b

Branch: refs/heads/master
Commit: 51e1e59b8988d7caf3b924378e2ff95037fca4d3
Parents: e63311f 2adf45f
Author: Kenneth Knowles 
Authored: Thu May 5 12:54:08 2016 -0700
Committer: Kenneth Knowles 
Committed: Thu May 5 12:54:08 2016 -0700

--
 .../direct/ExecutorServiceParallelExecutor.java | 51 
 1 file changed, 30 insertions(+), 21 deletions(-)
--




[GitHub] incubator-beam pull request: Remove Dataflow runner references in ...

2016-05-05 Thread peihe
GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/293

Remove Dataflow runner references in WordCount examples.





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam fix-minimum-wc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #293


commit 29e3ef6284e78dacecd6fef0565649d16fc66303
Author: Pei He 
Date:   2016-05-02T19:49:35Z

Remove Dataflow runner references in WordCount examples.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-115) Beam Runner API

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272941#comment-15272941
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/incubator-beam/pull/277


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-115] Port batch Flink GroupByKe...

2016-05-05 Thread kennknowles
Github user kennknowles closed the pull request at:

https://github.com/apache/incubator-beam/pull/277


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request: Add wildcard to checkstyle ordering

2016-05-05 Thread sammcveety
GitHub user sammcveety opened a pull request:

https://github.com/apache/incubator-beam/pull/292

Add wildcard to checkstyle ordering

Add wildcard to checkstyle, to handle unexpected package prefixes.  These 
should still be ordered before the sun and java packages.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sammcveety/incubator-beam patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #292


commit e31f1a696475ea110d3d3b3dca382b64d12c582b
Author: sammcveety 
Date:   2016-05-05T19:15:24Z

Add wildcard to checkstyle ordering

Add wildcard to checkstyle, to handle unexpected package prefixes.  These 
should still be ordered before the sun and java packages.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-258) Execute selected RunnableOnService tests with Flink runner

2016-05-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272882#comment-15272882
 ] 

ASF GitHub Bot commented on BEAM-258:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/291

[BEAM-258] Configure RunnableOnService tests for Flink runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This is a sample configuration for now.

There are these kind of failures in the tests right now:

1. Since the batch runner only supports global windows, I've filtered those 
tests out. I added the `UnsupportedOperationException` to the windowing 
translator so I could distinguish them.
2. Those tests that have simple & supported pipelines succeed at building 
the pipeline, but somehow the graph is empty - I have checked and it seemed 
like translators are not even being invoked. This is beyond my current scope of 
digging in.
3. Some other tests fail with other misc errors. Perhaps they use state, 
which results in `NullPointerException`.
4. Pretty much all of the tests require side inputs since that is how 
`PAssert` works, so they cannot work in streaming mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam flink-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #291


commit 52198eb7e9c2df627871e7f96404d188dc4a4ee7
Author: Kenneth Knowles 
Date:   2016-05-02T21:29:30Z

Add Window.Bound translator to Flink batch

This adds a Window.Bound translator that allows only
GlobalWindows. It is a temporary measure, but one that
brings the Flink batch translator in line with the
Beam model - instead of "ignoring" windows, the GBK
is a perfectly valid GBK for GlobalWindows.

Previously, the SDK's runner test suite would fail
due to the lack of a translator - now some of them
will fail due to windowing support, but others have
a chance.

commit 095f9840dd0d8d78f041e494613375664f7d3eaa
Author: Kenneth Knowles 
Date:   2016-05-02T20:11:12Z

Add TestFlinkPipelineRunner to FlinkRunnerRegistrar

This makes the runner available for selection by integration tests.

commit 750a49d286f4d11d6ad63460d8b244a5ebde975e
Author: Kenneth Knowles 
Date:   2016-05-02T21:04:20Z

Configure RunnableOnService tests for Flink in batch mode

Today Flink batch supports only global windows. This is a situation we
intend our build to allow, eventually via JUnit category filtering.

For now all the test classes that use non-global windows are excluded
entirely via maven configuration. In the future, it should be on a
per-test-method basis.




> Execute selected RunnableOnService tests with Flink runner
> --
>
> Key: BEAM-258
> URL: https://issues.apache.org/jira/browse/BEAM-258
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-258) Execute selected RunnableOnService tests with Flink runner

2016-05-05 Thread Kenneth Knowles (JIRA)
Kenneth Knowles created BEAM-258:


 Summary: Execute selected RunnableOnService tests with Flink runner
 Key: BEAM-258
 URL: https://issues.apache.org/jira/browse/BEAM-258
 Project: Beam
  Issue Type: Test
  Components: runner-flink
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-52) KafkaIO - bounded/unbounded, source/sink

2016-05-05 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272649#comment-15272649
 ] 

Raghu Angadi commented on BEAM-52:
--

Kafka Sink PR : https://github.com/apache/incubator-beam/pull/271

> KafkaIO - bounded/unbounded, source/sink
> 
>
> Key: BEAM-52
> URL: https://issues.apache.org/jira/browse/BEAM-52
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Daniel Halperin
>Assignee: Raghu Angadi
>
> We should support Apache Kafka. The priority list is probably:
> * UnboundedSource
> * unbounded Sink
> * BoundedSource
> * bounded Sink
> The connector should be well-tested, especially around UnboundedSource 
> checkpointing and resuming, and data duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)