[GitHub] incubator-beam pull request: [BEAM-79] add Gearpump runner

2016-05-10 Thread manuzhang
GitHub user manuzhang opened a pull request:

https://github.com/apache/incubator-beam/pull/323

[BEAM-79] add Gearpump runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This PR adds Gearpump runner to Beam meeting the goals of phase 1 in the 
[design 
document](https://docs.google.com/document/d/1nw64QUWVfT8L7FUprPGLEeNjSBpDMkn1otfLt2rHM5g/edit).

The Gearpump runner supports the following functionalities,

* Transformations: ParDo, GroupByKey, Flatten
* Windows: using Beam's window logic and code
* side outputs
* serialization/deserialization: using Gearpump's Kryo serializer
* sources: Beam's UnboundedSource
* message delivery guarantee: at-most-once
* tests: integration test for various translators

Here's a snapshot of running the following Beam example on Gearpump cluster

```java
PCollection> wordCounts =
p.apply(Read.from(new UnboundedTextSource()).named("WordStream"))
.apply(ParDo.of(new ExtractWordsFn()))

.apply(Window.into(FixedWindows.of(Duration.standardSeconds(10
.apply(Count.perElement());

wordCounts.apply(ParDo.of(new FormatAsStringFn()));
```


![snip20160511_4](https://cloud.githubusercontent.com/assets/1191767/15171197/fd6ffba0-177e-11e6-99a1-30c7c2597244.png)

Note that the Gearpump runner is still in early stage and lacking 
capabilities like trigger, side inputs, aggregator. However, I'd like to have 
the community to get a feel of what Gearpump is like, whether Beam and Gearpump 
go well, and gather ideas for improvements. 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam gearpump_runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #323


commit 73e5978f599bcf32ed8c2f1d54b6bd3bd8350092
Author: manuzhang 
Date:   2016-03-15T08:15:16Z

[BEAM-79] add Gearpump runner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-79) Gearpump runner

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279595#comment-15279595
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user manuzhang opened a pull request:

https://github.com/apache/incubator-beam/pull/323

[BEAM-79] add Gearpump runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This PR adds Gearpump runner to Beam meeting the goals of phase 1 in the 
[design 
document](https://docs.google.com/document/d/1nw64QUWVfT8L7FUprPGLEeNjSBpDMkn1otfLt2rHM5g/edit).

The Gearpump runner supports the following functionalities,

* Transformations: ParDo, GroupByKey, Flatten
* Windows: using Beam's window logic and code
* side outputs
* serialization/deserialization: using Gearpump's Kryo serializer
* sources: Beam's UnboundedSource
* message delivery guarantee: at-most-once
* tests: integration test for various translators

Here's a snapshot of running the following Beam example on Gearpump cluster

```java
PCollection> wordCounts =
p.apply(Read.from(new UnboundedTextSource()).named("WordStream"))
.apply(ParDo.of(new ExtractWordsFn()))

.apply(Window.into(FixedWindows.of(Duration.standardSeconds(10
.apply(Count.perElement());

wordCounts.apply(ParDo.of(new FormatAsStringFn()));
```


![snip20160511_4](https://cloud.githubusercontent.com/assets/1191767/15171197/fd6ffba0-177e-11e6-99a1-30c7c2597244.png)

Note that the Gearpump runner is still in early stage and lacking 
capabilities like trigger, side inputs, aggregator. However, I'd like to have 
the community to get a feel of what Gearpump is like, whether Beam and Gearpump 
go well, and gather ideas for improvements. 




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam gearpump_runner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #323


commit 73e5978f599bcf32ed8c2f1d54b6bd3bd8350092
Author: manuzhang 
Date:   2016-03-15T08:15:16Z

[BEAM-79] add Gearpump runner




> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279326#comment-15279326
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/322

[BEAM-22] More regularly schedule additional roots

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This ensures that even when elements are pushed back into the Pipeline
Runner, roots are scheduled if necessary.

As elements may be rescheduled indefinitely, this is required to ensure
that unbounded roots are scheduled during pipeline execution when
existing elements are blocked on side inputs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
more_aggressively_add_roots

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #322


commit 1f5e399630e3ba71e2c025e7ab3f32c6c3ba9518
Author: Thomas Groh 
Date:   2016-05-11T00:11:23Z

More regularly schedule additional roots

This ensures that even when elements are pushed back into the Pipeline
Runner, roots are scheduled if necessary.

As elements may be rescheduled indefinitely, this is required to ensure
that unbounded roots are scheduled during pipeline execution when
existing elements are blocked on side inputs.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-22] More regularly schedule add...

2016-05-10 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/322

[BEAM-22] More regularly schedule additional roots

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This ensures that even when elements are pushed back into the Pipeline
Runner, roots are scheduled if necessary.

As elements may be rescheduled indefinitely, this is required to ensure
that unbounded roots are scheduled during pipeline execution when
existing elements are blocked on side inputs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
more_aggressively_add_roots

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #322


commit 1f5e399630e3ba71e2c025e7ab3f32c6c3ba9518
Author: Thomas Groh 
Date:   2016-05-11T00:11:23Z

More regularly schedule additional roots

This ensures that even when elements are pushed back into the Pipeline
Runner, roots are scheduled if necessary.

As elements may be rescheduled indefinitely, this is required to ensure
that unbounded roots are scheduled during pipeline execution when
existing elements are blocked on side inputs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request: Upgrade versions of google libraries

2016-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/321


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Upgrade versions of google libraries

2016-05-10 Thread lcwik
Repository: incubator-beam
Updated Branches:
  refs/heads/master b915f79d1 -> acb040684


Upgrade versions of google libraries

GCSIO 1.4.3 -> 1.4.5 (bug fixes)
Guava 18.0 -> 19.0 (only impacts maven archetypes, everything else was on 19.0)
google-auth-library-oauth2-http 0.3.1 -> 0.4.0 (bug fixes)


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/e263af1b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/e263af1b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/e263af1b

Branch: refs/heads/master
Commit: e263af1b5150b5e987ba76c75f9cdbb1167a5bff
Parents: b915f79
Author: Luke Cwik 
Authored: Tue May 10 15:32:28 2016 -0700
Committer: Luke Cwik 
Committed: Tue May 10 15:32:28 2016 -0700

--
 pom.xml| 2 +-
 sdks/java/core/pom.xml | 2 +-
 .../examples/src/main/resources/archetype-resources/pom.xml| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/e263af1b/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 946222f..633b8fd 100644
--- a/pom.xml
+++ b/pom.xml
@@ -110,7 +110,7 @@
 1.0-rc2
 1.1
 1.22.0
-1.4.3
+1.4.5
 
0.5.160304
 19.0
 0.12.0

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/e263af1b/sdks/java/core/pom.xml
--
diff --git a/sdks/java/core/pom.xml b/sdks/java/core/pom.xml
index d1adf55..07fd0b1 100644
--- a/sdks/java/core/pom.xml
+++ b/sdks/java/core/pom.xml
@@ -296,7 +296,7 @@
 
   com.google.auth
   google-auth-library-oauth2-http
-  0.3.1
+  0.4.0
   
 

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/e263af1b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
--
diff --git 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
index 81af2dc..0a9dba5 100644
--- 
a/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
+++ 
b/sdks/java/maven-archetypes/examples/src/main/resources/archetype-resources/pom.xml
@@ -171,7 +171,7 @@
 
   com.google.guava
   guava
-  18.0
+  19.0
 
 
  



[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279008#comment-15279008
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/320

[BEAM-22] Reuse DoFns in ParDoEvaluators

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This allows the runner to avoid cloning DoFns for every input bundle.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
reuse_dofns_pardoevaluators

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/320.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #320


commit f018c75ef3ca1b60fc990bd88031d5419c571b87
Author: Thomas Groh 
Date:   2016-05-10T21:06:27Z

Reuse DoFns in ParDoEvaluators

This allows the runner to avoid cloning DoFns for every input bundle.




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278813#comment-15278813
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/318

[BEAM-22] Use an AtomicReference in InProcessSideInputContainer

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This fixes a TOCTOU race in the contents updating logic, where the
determination that the current pane should replace the contents of the
side input and the replacement is not a single atomic operation. Using
AtomicReference allows the use of compareAndSet to ensure that the
replacement can only occur on the pane that the decision to replace was
made with.

Fixes a race where a pane could be the latest, and replace a
pane, but would be lost due to an earlier pane being written between the
invalidation and loading of contents.

Fixes a race where a reader can incorrectly read an empty iterable as
the contents of a PCollectionView, due to occuring between the
invalidate and reload steps.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
atomic_reference_side_input_container

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #318






> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-103) Make UnboundedSourceWrapper parallel

2016-05-10 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-103.
---
Resolution: Fixed

> Make UnboundedSourceWrapper parallel
> 
>
> Key: BEAM-103
> URL: https://issues.apache.org/jira/browse/BEAM-103
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> As of now {{UnboundedSource}} s are executed with a parallelism of 1 
> regardless of the splits which the source returns. The corresponding 
> {{UnboundedSourceWrapper}} should implement {{RichParallelSourceFunction}} 
> and deal with splits correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278844#comment-15278844
 ] 

ASF GitHub Bot commented on BEAM-22:


GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/319

[BEAM-22] Enable RunnableOnService Tests

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Not ready for review. Publishing PR to hook into Travis and Jenkins.

Update runners/direct-java/pom.xml to enable the RunnableOnService
tests phase.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam enable_ros_tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #319


commit d1796ba6fecb8423e563fcdf66946beda79e52c6
Author: Thomas Groh 
Date:   2016-05-09T22:47:27Z

Minor checkArgument style fix

commit f0e38fd170949f27d4794113e4bcb2077ffe88a6
Author: Thomas Groh 
Date:   2016-05-10T18:27:37Z

Use an AtomicReference in InProcessSideInputContainer

This fixes a TOCTOU race in the contents updating logic, where the
determination that the current pane should replace the contents of the
side input and the replacement is not a single atomic operation. Using
AtomicReference allows the use of compareAndSet to ensure that the
replacement can only occur on the pane that the decision to replace was
made with.

Fixes a race where a pane could be the latest, and replace a
pane, but would be lost due to an earlier pane being written between the
invalidation and loading of contents.

Fixes a race where a reader can incorrectly read an empty iterable as
the contents of a PCollectionView, due to occuring between the
invalidate and reload steps.

commit e06e449e3762a48404d0407babaff440ebfa416e
Author: Thomas Groh 
Date:   2016-05-10T20:22:20Z

Cache read SideInput Contents in the InProcessSideInputContainer

This ensures that while processing a bundle all elements see the same
contents for any SideInput Window.

commit 8ff1d79474f3d114381b924fa61aa46bd7b935db
Author: Thomas Groh 
Date:   2016-05-10T20:36:21Z

Enable RunnableOnService tests for the Direct Runner




> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-271) Option to configure remote Dataflow windmill service endpoint

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278839#comment-15278839
 ] 

ASF GitHub Bot commented on BEAM-271:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/314


> Option to configure remote Dataflow windmill service endpoint
> -
>
> Key: BEAM-271
> URL: https://issues.apache.org/jira/browse/BEAM-271
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Raghu Angadi
>Assignee: Davor Bonaci
>Priority: Minor
>
> Add two options to DataflowPipelineDebugOptions to configure Dataflow remove 
> windmill service. This lets Dataflow users to configure the streaming 
> pipelines to point to remote windmill service.
> https://github.com/apache/incubator-beam/pull/314



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: [BEAM-271] This closes #314

2016-05-10 Thread lcwik
[BEAM-271] This closes #314


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/b915f79d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/b915f79d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/b915f79d

Branch: refs/heads/master
Commit: b915f79d1cbf116599d4b9229d45dc7757583ac3
Parents: 4020e36 cbeed43
Author: Luke Cwik 
Authored: Tue May 10 13:29:39 2016 -0700
Committer: Luke Cwik 
Committed: Tue May 10 13:29:39 2016 -0700

--
 .../dataflow/options/DataflowPipelineDebugOptions.java  | 12 
 1 file changed, 12 insertions(+)
--




[1/2] incubator-beam git commit: [BEAM-271] Option to configure remote Dataflow windmill service endpoint

2016-05-10 Thread lcwik
Repository: incubator-beam
Updated Branches:
  refs/heads/master 4020e3645 -> b915f79d1


[BEAM-271] Option to configure remote Dataflow windmill service endpoint


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/cbeed430
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/cbeed430
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/cbeed430

Branch: refs/heads/master
Commit: cbeed43097c06092fc81c066bcb97d1f63683c97
Parents: 4020e36
Author: Raghu Angadi 
Authored: Mon May 9 23:38:32 2016 -0700
Committer: Luke Cwik 
Committed: Tue May 10 13:29:06 2016 -0700

--
 .../dataflow/options/DataflowPipelineDebugOptions.java  | 12 
 1 file changed, 12 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/cbeed430/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
--
diff --git 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
index 71c8a78..8765adf 100644
--- 
a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
+++ 
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineDebugOptions.java
@@ -191,6 +191,18 @@ public interface DataflowPipelineDebugOptions extends 
PipelineOptions {
   void setOverrideWindmillBinary(String value);
 
   /**
+   * Custom windmill service endpoint.
+   */
+  @Description("Custom windmill service endpoint.")
+  String getWindmillServiceEndpoint();
+  void setWindmillServiceEndpoint(String value);
+
+  @Description("Port for communicating with a remote windmill service.")
+  @Default.Integer(443)
+  int getWindmillServicePort();
+  void setWindmillServicePort(int value);
+
+  /**
* Number of threads to use on the Dataflow worker harness. If left 
unspecified,
* the Dataflow service will compute an appropriate number of threads to use.
*/



[GitHub] incubator-beam pull request: [BEAM-271] Option to configure remote...

2016-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/314


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (BEAM-130) Checkpointing of custom sources and sinks

2016-05-10 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved BEAM-130.
---
Resolution: Fixed

Resolved in this (closed) PR: https://github.com/apache/incubator-beam/pull/274

> Checkpointing of custom sources and sinks
> -
>
> Key: BEAM-130
> URL: https://issues.apache.org/jira/browse/BEAM-130
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kostas Kloudas
>Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-22] Use an AtomicReference in I...

2016-05-10 Thread tgroh
GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/318

[BEAM-22] Use an AtomicReference in InProcessSideInputContainer

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This fixes a TOCTOU race in the contents updating logic, where the
determination that the current pane should replace the contents of the
side input and the replacement is not a single atomic operation. Using
AtomicReference allows the use of compareAndSet to ensure that the
replacement can only occur on the pane that the decision to replace was
made with.

Fixes a race where a pane could be the latest, and replace a
pane, but would be lost due to an earlier pane being written between the
invalidation and loading of contents.

Fixes a race where a reader can incorrectly read an empty iterable as
the contents of a PCollectionView, due to occuring between the
invalidate and reload steps.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam 
atomic_reference_side_input_container

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #318






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-103) Make UnboundedSourceWrapper parallel

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278810#comment-15278810
 ] 

ASF GitHub Bot commented on BEAM-103:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/274


> Make UnboundedSourceWrapper parallel
> 
>
> Key: BEAM-103
> URL: https://issues.apache.org/jira/browse/BEAM-103
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Aljoscha Krettek
>
> As of now {{UnboundedSource}} s are executed with a parallelism of 1 
> regardless of the splits which the source returns. The corresponding 
> {{UnboundedSourceWrapper}} should implement {{RichParallelSourceFunction}} 
> and deal with splits correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-103][BEAM-130] Make Flink Sourc...

2016-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/274


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[2/2] incubator-beam git commit: This Closes #274

2016-05-10 Thread aljoscha
This Closes #274


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/4020e364
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/4020e364
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/4020e364

Branch: refs/heads/master
Commit: 4020e364598befbd68458f1c6eada6d90a986358
Parents: 874ddef 336e90f
Author: Aljoscha Krettek 
Authored: Tue May 10 22:20:47 2016 +0200
Committer: Aljoscha Krettek 
Committed: Tue May 10 22:20:47 2016 +0200

--
 .../flink/DefaultParallelismFactory.java|  39 +++
 .../runners/flink/FlinkPipelineOptions.java |   2 +-
 .../FlinkStreamingTransformTranslators.java |  18 +-
 .../streaming/io/UnboundedSourceWrapper.java| 337 ---
 .../flink/streaming/TestCountingSource.java | 256 ++
 .../flink/streaming/UnboundedSourceITCase.java  | 208 
 .../streaming/UnboundedSourceWrapperTest.java   | 324 ++
 7 files changed, 916 insertions(+), 268 deletions(-)
--




[GitHub] incubator-beam pull request: Verify one element per window for Dat...

2016-05-10 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/incubator-beam/pull/317

Verify one element per window for DataflowPipelineRunner View.asSingleton

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This changes the expansion of the DataflowPipelineRunner override for
View.asSingleton to provide a useful error message to users if their
PCollection contains more than one element per window.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam singleton_verify

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/317.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #317


commit e423cdce40bb11b39f254828791f15b5c0490b86
Author: Luke Cwik 
Date:   2016-05-10T20:02:21Z

Verify one element per window for DataflowPipelineRunner View.asSingleton

This changes the expansion of the DataflowPipelineRunner override for
View.asSingleton to provide a useful error message to users if there
PCollection contains more than one element per window.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-beam pull request: [BEAM-191] Remove WindowedValue.value...

2016-05-10 Thread kennknowles
GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/316

[BEAM-191] Remove WindowedValue.valueInEmptyWindows

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

A value in empty windows expands to no values, so it can be dropped at
any time, perhaps unintentionally. This has bitten runner authors, including
Spark & Dataflow.

While creating such a thing in memory is not automatically problematic, it
is also not really useful. So this change removes it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam valueInEmptyWindows

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #316


commit f123b2b82f257fed312912fea8776a8059230c8e
Author: Kenneth Knowles 
Date:   2016-05-10T18:39:35Z

Remove WindowedValue.valueInEmptyWindows

A value in empty windows expands to no values, so it can be dropped at
any time, perhaps unintentionally. This has bitten runner authors, including
Spark & Dataflow.

While creating such a thing in memory is not automatically problematic, it
is also not really useful. So this change removes it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278646#comment-15278646
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/312


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-191) Remove WindowedValue.valueInEmptyWindows

2016-05-10 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-191:


Assignee: Kenneth Knowles

> Remove WindowedValue.valueInEmptyWindows
> 
>
> Key: BEAM-191
> URL: https://issues.apache.org/jira/browse/BEAM-191
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Minor
>  Labels: windowing
>
> Even though it is acceptable to create a WindowedValue carrying no windows 
> when it is a fully reified WindowedValue, it doesn't really make sense: 
> When it becomes an element in a PCollection that a value must exist within 
> some window.
> This has led to some confusion so we should remove the API. It seems there 
> are just 11 files that reference WindowedValue.valueInEmptyWindows [1] that 
> mostly look like they'd be fine with the global window.
> [1] 
> https://github.com/apache/incubator-beam/search?p=1=valueInEmptyWindows=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-beam git commit: Closes #312

2016-05-10 Thread kenn
Closes #312


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/874ddef0
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/874ddef0
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/874ddef0

Branch: refs/heads/master
Commit: 874ddef05a00060b1830722271e0678c463e68c7
Parents: 4870525 46bc6e1
Author: Kenneth Knowles 
Authored: Tue May 10 11:31:51 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue May 10 11:31:51 2016 -0700

--
 .../direct/InMemoryWatermarkManager.java| 113 +++
 .../direct/InProcessEvaluationContext.java  |  14 ++-
 .../direct/WatermarkCallbackExecutor.java   |   9 +-
 .../direct/InMemoryWatermarkManagerTest.java|  58 --
 .../direct/InProcessEvaluationContextTest.java  |   8 +-
 .../direct/WatermarkCallbackExecutorTest.java   |   4 +-
 6 files changed, 169 insertions(+), 37 deletions(-)
--




[GitHub] incubator-beam pull request: [BEAM-22] Update Watermarks Outside o...

2016-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/312


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-117) Implement the API for Static Display Metadata

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278633#comment-15278633
 ] 

ASF GitHub Bot commented on BEAM-117:
-

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/315

[BEAM-117] Runners should be resilient to DisplayData failure

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Display data is collected from PTransforms at Pipeline construction
time. Collecting display data runs user code from provided transforms
and fn's. These components should be designed not to throw during
pipeline construction, however we also shouldn't fail a pipeline
if this code does fail.

This PR adds resiliency to the DataflowPipelineTranslator, where
we collect display data for the Dataflow runner, and also a
RunnableOnService test to verify that all runners are resilient to
display data failures. Other runners are not yet using display data,
but will get this validation for free when they do.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam displaydata-safe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/315.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #315


commit d6c5025937cbd0016d0a83d08e53902ae4d4519b
Author: Scott Wegner 
Date:   2016-05-10T18:19:14Z

Runners should be resilient to DisplayData failure

Display data is collected from PTransforms at Pipeline construction
time. Collecting display data runs user code from provided transforms
and fn's. These components should be designed not to throw during
pipeline construction, however we also shouldn't fail a pipeline
if this code does fail.

This PR adds resiliency to the DataflowPipelineTranslator, where
we collect display data for the Dataflow runner, and also a
RunnableOnService test to verify that all runners are resilient to
display data failures. Other runners are not yet using display data,
but will get this validation for free when they do.




> Implement the API for Static Display Metadata
> -
>
> Key: BEAM-117
> URL: https://issues.apache.org/jira/browse/BEAM-117
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Ben Chambers
>Assignee: Scott Wegner
>
> As described in the following doc, we would like the SDK to allow associating 
> display metadata with PTransforms.
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-117] Runners should be resilien...

2016-05-10 Thread swegner
GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/315

[BEAM-117] Runners should be resilient to DisplayData failure

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Display data is collected from PTransforms at Pipeline construction
time. Collecting display data runs user code from provided transforms
and fn's. These components should be designed not to throw during
pipeline construction, however we also shouldn't fail a pipeline
if this code does fail.

This PR adds resiliency to the DataflowPipelineTranslator, where
we collect display data for the Dataflow runner, and also a
RunnableOnService test to verify that all runners are resilient to
display data failures. Other runners are not yet using display data,
but will get this validation for free when they do.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam displaydata-safe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/315.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #315


commit d6c5025937cbd0016d0a83d08e53902ae4d4519b
Author: Scott Wegner 
Date:   2016-05-10T18:19:14Z

Runners should be resilient to DisplayData failure

Display data is collected from PTransforms at Pipeline construction
time. Collecting display data runs user code from provided transforms
and fn's. These components should be designed not to throw during
pipeline construction, however we also shouldn't fail a pipeline
if this code does fail.

This PR adds resiliency to the DataflowPipelineTranslator, where
we collect display data for the Dataflow runner, and also a
RunnableOnService test to verify that all runners are resilient to
display data failures. Other runners are not yet using display data,
but will get this validation for free when they do.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections

2016-05-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278513#comment-15278513
 ] 

ASF GitHub Bot commented on BEAM-22:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/309


> DirectPipelineRunner: support for unbounded collections
> ---
>
> Key: BEAM-22
> URL: https://issues.apache.org/jira/browse/BEAM-22
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-direct
>Reporter: Davor Bonaci
>Assignee: Thomas Groh
>
> DirectPipelineRunner currently runs over bounded PCollections only, and 
> implements only a portion of the Beam Model.
> We should improve it to faithfully implement the full Beam Model, such as add 
> ability to run over unbounded PCollections, and better resemble execution 
> model in a distributed system.
> This further enables features such as a testing source which may simulate 
> late data and test triggers in the pipeline. Finally, we may want to expose 
> an option to select between "debug" (single threaded), "chaos monkey" (test 
> as many model requirements as possible), and "performance" (multi-threaded).
> more testing (chaos monkey) 
> Once this is done, we should update this StackOverflow question:
> http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/3] incubator-beam git commit: Limit the number of work schedules per MonitorRunnable run

2016-05-10 Thread kenn
Limit the number of work schedules per MonitorRunnable run

This ensures that work readded to the queue will not cause the monitor runnable
to run forever before delivering timers


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/d3b96bc3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/d3b96bc3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/d3b96bc3

Branch: refs/heads/master
Commit: d3b96bc33c7e9846f756457d1214a011da1cf84b
Parents: 272493e
Author: Thomas Groh 
Authored: Thu Apr 28 10:12:09 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue May 10 10:15:14 2016 -0700

--
 .../direct/ExecutorServiceParallelExecutor.java  | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/d3b96bc3/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
index de409e3..367c190 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ExecutorServiceParallelExecutor.java
@@ -54,6 +54,7 @@ import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentLinkedQueue;
 import java.util.concurrent.ConcurrentMap;
 import java.util.concurrent.ExecutorService;
+import java.util.concurrent.TimeUnit;
 
 import javax.annotation.Nullable;
 
@@ -344,11 +345,11 @@ final class ExecutorServiceParallelExecutor implements 
InProcessExecutor {
   }
 
   private class MonitorRunnable implements Runnable {
-private final String runnableName =
-String.format(
-"%s$%s-monitor",
-evaluationContext.getPipelineOptions().getAppName(),
-ExecutorServiceParallelExecutor.class.getSimpleName());
+// arbitrary termination condition to ensure progress in the presence of 
pushback
+private final long maxTimeProcessingUpdatesNanos = 
TimeUnit.MILLISECONDS.toNanos(5L);
+private final String runnableName = String.format("%s$%s-monitor",
+evaluationContext.getPipelineOptions().getAppName(),
+ExecutorServiceParallelExecutor.class.getSimpleName());
 
 @Override
 public void run() {
@@ -356,7 +357,9 @@ final class ExecutorServiceParallelExecutor implements 
InProcessExecutor {
   Thread.currentThread().setName(runnableName);
   try {
 ExecutorUpdate update = allUpdates.poll();
+int numUpdates = 0;
 // pull all of the pending work off of the queue
+long updatesStart = System.nanoTime();
 while (update != null) {
   LOG.debug("Executor Update: {}", update);
   if (update.getBundle().isPresent()) {
@@ -364,7 +367,11 @@ final class ExecutorServiceParallelExecutor implements 
InProcessExecutor {
   } else if (update.getException().isPresent()) {
 
visibleUpdates.offer(VisibleExecutorUpdate.fromThrowable(update.getException().get()));
   }
-  update = allUpdates.poll();
+  if (System.nanoTime() - updatesStart > 
maxTimeProcessingUpdatesNanos) {
+break;
+  } else {
+update = allUpdates.poll();
+  }
 }
 boolean timersFired = fireTimers();
 addWorkIfNecessary(timersFired);



[1/3] incubator-beam git commit: Use PushbackDoFnRunner in the ParDoInProcessEvaluator

2016-05-10 Thread kenn
Repository: incubator-beam
Updated Branches:
  refs/heads/master 7c917a6ee -> 487052588


Use PushbackDoFnRunner in the ParDoInProcessEvaluator

This ensures that the evaluator does not block while processing an input
bundle.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/dd4ef6ff
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/dd4ef6ff
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/dd4ef6ff

Branch: refs/heads/master
Commit: dd4ef6ffc67d4776d115bd6a77483c6f2fd66ae5
Parents: d3b96bc
Author: Thomas Groh 
Authored: Wed Apr 27 17:27:57 2016 -0700
Committer: Kenneth Knowles 
Committed: Tue May 10 10:15:14 2016 -0700

--
 .../runners/direct/ParDoInProcessEvaluator.java |  24 ++-
 .../direct/ParDoInProcessEvaluatorTest.java | 214 +++
 .../sdk/util/PushbackSideInputDoFnRunner.java   |   2 +-
 .../sdk/util/IdentitySideInputWindowFn.java |   2 +-
 4 files changed, 235 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/dd4ef6ff/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoInProcessEvaluator.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoInProcessEvaluator.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoInProcessEvaluator.java
index 1c51738..2cdf6cb 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoInProcessEvaluator.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/ParDoInProcessEvaluator.java
@@ -25,6 +25,8 @@ import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.util.DoFnRunner;
 import org.apache.beam.sdk.util.DoFnRunners;
 import org.apache.beam.sdk.util.DoFnRunners.OutputManager;
+import org.apache.beam.sdk.util.PushbackSideInputDoFnRunner;
+import org.apache.beam.sdk.util.ReadyCheckingSideInputReader;
 import org.apache.beam.sdk.util.SerializableUtils;
 import org.apache.beam.sdk.util.UserCodeException;
 import org.apache.beam.sdk.util.WindowedValue;
@@ -34,6 +36,8 @@ import org.apache.beam.sdk.values.PCollection;
 import org.apache.beam.sdk.values.PCollectionView;
 import org.apache.beam.sdk.values.TupleTag;
 
+import com.google.common.collect.ImmutableList;
+
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashMap;
@@ -65,17 +69,21 @@ class ParDoInProcessEvaluator implements 
TransformEvaluator {
   evaluationContext.createBundle(inputBundle, outputEntry.getValue()));
 }
 
-DoFnRunner runner =
+ReadyCheckingSideInputReader sideInputReader =
+evaluationContext.createSideInputReader(sideInputs);
+DoFnRunner underlying =
 DoFnRunners.createDefault(
 evaluationContext.getPipelineOptions(),
 SerializableUtils.clone(fn),
-evaluationContext.createSideInputReader(sideInputs),
+sideInputReader,
 BundleOutputManager.create(outputBundles),
 mainOutputTag,
 sideOutputTags,
 stepContext,
 counters.getAddCounterMutator(),
 application.getInput().getWindowingStrategy());
+PushbackSideInputDoFnRunner runner =
+PushbackSideInputDoFnRunner.create(underlying, sideInputs, 
sideInputReader);
 
 try {
   runner.startBundle();
@@ -89,14 +97,16 @@ class ParDoInProcessEvaluator implements 
TransformEvaluator {
 
   

 
-  private final DoFnRunner fnRunner;
+  private final PushbackSideInputDoFnRunner fnRunner;
   private final AppliedPTransform transform;
   private final CounterSet counters;
   private final Collection outputBundles;
   private final InProcessStepContext stepContext;
 
+  private final ImmutableList.Builder unprocessedElements;
+
   private ParDoInProcessEvaluator(
-  DoFnRunner fnRunner,
+  PushbackSideInputDoFnRunner fnRunner,
   AppliedPTransform transform,
   CounterSet counters,
   Collection outputBundles,
@@ -106,12 +116,15 @@ class ParDoInProcessEvaluator implements 
TransformEvaluator {
 this.counters = counters;
 this.outputBundles = outputBundles;
 this.stepContext = stepContext;
+
+this.unprocessedElements = ImmutableList.builder();
   }
 
   @Override
   public void processElement(WindowedValue element) {
 try {
-  fnRunner.processElement(element);
+  Iterable unprocessed = 

[GitHub] incubator-beam pull request: [BEAM-22] Return null evaluators from...

2016-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/310


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[1/2] incubator-beam git commit: Return null evaluators from Unavailable Reads

2016-05-10 Thread kenn
Repository: incubator-beam
Updated Branches:
  refs/heads/master 9feea2389 -> 7c917a6ee


Return null evaluators from Unavailable Reads

Null TransformEvaluators for sources represent a source where all splits
are currently in use or completed.

Update TransformExecutor to handle null evaluators properly.

Change TransformExecutor to a Runnable.


Project: http://git-wip-us.apache.org/repos/asf/incubator-beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam/commit/2d94540c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam/tree/2d94540c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam/diff/2d94540c

Branch: refs/heads/master
Commit: 2d94540ccd02b19a73410647e84486c008d9cdc5
Parents: 272493e
Author: Thomas Groh 
Authored: Mon May 9 14:03:16 2016 -0700
Committer: Thomas Groh 
Committed: Mon May 9 14:35:38 2016 -0700

--
 .../runners/direct/BoundedReadEvaluatorFactory.java |  8 ++--
 .../runners/direct/TransformEvaluatorFactory.java   | 10 --
 .../beam/runners/direct/TransformExecutor.java  |  9 ++---
 .../direct/UnboundedReadEvaluatorFactory.java   |  8 ++--
 .../direct/BoundedReadEvaluatorFactoryTest.java | 16 +++-
 .../direct/TransformExecutorServicesTest.java   | 10 +-
 .../beam/runners/direct/TransformExecutorTest.java  |  8 
 .../direct/UnboundedReadEvaluatorFactoryTest.java   | 13 -
 8 files changed, 34 insertions(+), 48 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/2d94540c/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
index 3822d3b..f15d446 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
@@ -51,6 +51,7 @@ final class BoundedReadEvaluatorFactory implements 
TransformEvaluatorFactory {
 
   @SuppressWarnings({"unchecked", "rawtypes"})
   @Override
+  @Nullable
   public  TransformEvaluator forApplication(
   AppliedPTransform application,
   @Nullable CommittedBundle inputBundle,
@@ -62,12 +63,7 @@ final class BoundedReadEvaluatorFactory implements 
TransformEvaluatorFactory {
   private  TransformEvaluator getTransformEvaluator(
   final AppliedPTransform, Bounded> 
transform,
   final InProcessEvaluationContext evaluationContext) {
-BoundedReadEvaluator evaluator =
-getTransformEvaluatorQueue(transform, evaluationContext).poll();
-if (evaluator == null) {
-  return EmptyTransformEvaluator.create(transform);
-}
-return evaluator;
+return getTransformEvaluatorQueue(transform, evaluationContext).poll();
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/incubator-beam/blob/2d94540c/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TransformEvaluatorFactory.java
--
diff --git 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TransformEvaluatorFactory.java
 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TransformEvaluatorFactory.java
index 8f8d84c..f2d577e 100644
--- 
a/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TransformEvaluatorFactory.java
+++ 
b/runners/direct-java/src/main/java/org/apache/beam/runners/direct/TransformEvaluatorFactory.java
@@ -18,6 +18,7 @@
 package org.apache.beam.runners.direct;
 
 import org.apache.beam.runners.direct.InProcessPipelineRunner.CommittedBundle;
+import org.apache.beam.sdk.io.Read;
 import org.apache.beam.sdk.transforms.AppliedPTransform;
 import org.apache.beam.sdk.transforms.DoFn;
 import org.apache.beam.sdk.transforms.PTransform;
@@ -32,13 +33,18 @@ public interface TransformEvaluatorFactory {
   /**
* Create a new {@link TransformEvaluator} for the application of the {@link 
PTransform}.
*
-   * Any work that must be done before input elements are processed (such as 
calling
+   * Any work that must be done before input elements are processed (such 
as calling
* {@link DoFn#startBundle(DoFn.Context)}) must be done before the {@link 
TransformEvaluator} is
* made available to the caller.
*
+   * May return null if the application cannot produce an evaluator (for 
example, it is a
+   * {@link Read} {@link PTransform} where all evaluators are in-use).
+   *
+   * @return An evaluator capable of processing the transform on the bundle, 
or null if no 

Jenkins build is back to stable : beam_PostCommit_RunnableOnService_GoogleCloudDataflow #311

2016-05-10 Thread Apache Jenkins Server
See 




[jira] [Created] (BEAM-271) Option to configure remote Dataflow windmill service endpoint

2016-05-10 Thread Raghu Angadi (JIRA)
Raghu Angadi created BEAM-271:
-

 Summary: Option to configure remote Dataflow windmill service 
endpoint
 Key: BEAM-271
 URL: https://issues.apache.org/jira/browse/BEAM-271
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Raghu Angadi
Assignee: Davor Bonaci
Priority: Minor


Add two options to DataflowPipelineDebugOptions to configure Dataflow remove 
windmill service. This lets Dataflow users to configure the streaming pipelines 
to point to remote windmill service.

https://github.com/apache/incubator-beam/pull/314




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-beam pull request: [BEAM-todo] Option to configure remot...

2016-05-10 Thread rangadi
GitHub user rangadi opened a pull request:

https://github.com/apache/incubator-beam/pull/314

[BEAM-todo] Option to configure remote Dataflow windmill service endpoint

Add two options to DataflowPipelineDebugOptions to configure Dataflow 
remove windmill service. This lets Dataflow users to configure the streaming 
pipelines to point to remote windmill service.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rangadi/incubator-beam endpoint

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #314


commit fbc9049df57956540bf39ef4ce7161f32be7f2c1
Author: Raghu Angadi 
Date:   2016-05-10T06:38:32Z

Option to configure remote Dataflow windmill service endpoint




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---