[jira] [Commented] (BEAM-115) Beam Runner API

2018-04-19 Thread Joar Wandborg (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444321#comment-16444321
 ] 

Joar Wandborg commented on BEAM-115:


[~kenn] There's a TODO left over from this issue at

[https://github.com/apache/beam/blob/9c9f4ceceb87933da2b03304efd21edf55216937/sdks/python/apache_beam/pipeline.py#L835,]
 is there another ticket tracking this?

> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: Not applicable
>
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056433#comment-16056433
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3361


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-06-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049723#comment-16049723
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3361

[BEAM-115] Port fn_api_runner to be able to use runner protos.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam fn-api-runner-protos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3361


commit fa8dd580405970f3a18310fe0562c16b68e813ce
Author: Robert Bradshaw 
Date:   2017-06-14T21:52:43Z

Port fn_api_runner to be able to use runner protos.

commit 7131b231835f043398b135aeed660d599c9d643b
Author: Robert Bradshaw 
Date:   2017-06-14T22:03:00Z

fixup: cleanup




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034974#comment-16034974
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3222


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026574#comment-16026574
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3233


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025898#comment-16025898
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/3233

[BEAM-115] Runner API Translations for StateSpec and TimerSpec

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @tgroh 

The awkward bit is that a combining `StateSpec` is statically typed to take 
a `CombineFn`. So when we get around to issuing Fn State API requests, we'll 
need to work around that.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam translate-StateSpec

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3233


commit c87b72d5522e3369becd7fbe022824f4f223a9ae
Author: Kenneth Knowles 
Date:   2017-05-25T14:12:08Z

Flesh out TimerSpec and StateSpec in Runner API

commit 55af992a3a465b333bc3ae51262bfc43474d90e8
Author: Kenneth Knowles 
Date:   2017-05-25T14:25:08Z

Mark CombineFnWithContext StateSpecs internal

commit 34eca25c5a6b6f3733f51fb6cb421cc755b7058c
Author: Kenneth Knowles 
Date:   2017-05-25T14:27:52Z

Add case dispatch to StateSpec

This is different than a StateBinder: for a binder, the id is needed and
the StateSpec controls the return type. For case dispatch, the
dispatcher controls the type and it should just be reading the spec,
which does not require the id. Eventually, StateBinder could be removed
in favor of StateSpec.Cases>.

commit 09aeab25a92ef961a8968a5a3e863786750dff46
Author: Kenneth Knowles 
Date:   2017-05-25T20:02:15Z

Allow translation to throw IOException

commit 9eb1ef070506e7b419bd388f2a0f9407056a8bcb
Author: Kenneth Knowles 
Date:   2017-05-26T05:51:18Z

Make Java serialized CombineFn URN public

commit 5aa01d64ff50d9396137ada84b281700ce1d8d8d
Author: Kenneth Knowles 
Date:   2017-05-25T14:12:29Z

Implement TimerSpec and StateSpec translation




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023946#comment-16023946
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3222

[BEAM-115] Unify Java and Python WindowingStragegy representations.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`.
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-windows

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3222


commit 5fb2043e7f25871f4e9ad7de0e6c3380290b0c21
Author: Robert Bradshaw 
Date:   2017-05-25T00:23:31Z

Unify Java and Python WindowingStragegy representations.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991903#comment-15991903
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2644


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979640#comment-15979640
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/2644

[BEAM-115] Fn API support for Python

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam fn-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2644


commit 4f5c5241949473d81b18cb034640189ea98c49df
Author: Robert Bradshaw 
Date:   2017-04-21T16:59:52Z

Add instructions to regenerate Python proto wrappers.

commit f4f94db5bb3b64c015fd95de64bafd34770aaa21
Author: Robert Bradshaw 
Date:   2017-04-21T17:00:20Z

Generate python proto wrappers for runner and fn API.

commit 6f0e487749b777c5606a69865b7e47134d878a58
Author: Robert Bradshaw 
Date:   2017-04-21T18:04:03Z

Ignore generated files for linter.

commit aa1425c86f5def43ff9032b8666352b747d20440
Author: Robert Bradshaw 
Date:   2017-04-21T18:09:03Z

Ignore generated files in rat plugin.

commit 91428b06f8d4ced0bd3965801920cd94ee8298c6
Author: Robert Bradshaw 
Date:   2017-04-21T19:15:02Z

Add apache licence to generated files.

commit 27f718e89d1829b5b217f263d4b4deb8832c947d
Author: Robert Bradshaw 
Date:   2017-04-20T20:59:25Z

Add fn api runner.

commit f987622d524fa4245089f82fbd150c0cf9ae380d
Author: Robert Bradshaw 
Date:   2017-04-20T21:20:29Z

Restore __init__.py

commit 45ccb571aed5e51ef893428c2a5350ba01864aac
Author: Robert Bradshaw 
Date:   2017-04-20T20:44:33Z

Add runner core files.

commit c6310ad8236578c05432244ced40fe572c2d71a7
Author: Robert Bradshaw 
Date:   2017-04-20T21:23:30Z

move files around

commit 0f9de53c7007e200e5f04c9a30349313b2bdff39
Author: Robert Bradshaw 
Date:   2017-04-20T21:27:14Z

more moving

commit 2f5c518bff028878b7d18a30141beea734cb36a0
Author: Robert Bradshaw 
Date:   2017-04-20T21:43:28Z

Rename runners.

commit 85b2172c1d8d6a5797015c15cd5a301de4a252c6
Author: Robert Bradshaw 
Date:   2017-04-20T21:53:26Z

cythonization works

commit 9f1177db48fa0230fab463cb7a5d784b96c9e084
Author: Robert Bradshaw 
Date:   2017-04-20T22:03:52Z

test module import renames

commit a0e8d792bbfac4550e4399b29c9004f67b82cbd0
Author: Robert Bradshaw 
Date:   2017-04-20T22:06:30Z

remove google3s

commit dfc0b2f3ca04594a2167eaadc7882ca257cdca5f
Author: Robert Bradshaw 
Date:   2017-04-20T22:07:37Z

move sdk_harness to sdk_worker

commit 8b08216c54e65fd0eb5a6c9588405cb4301267b2
Author: Robert Bradshaw 
Date:   2017-04-20T22:09:33Z

Remove unused end_time from statesampler.

commit 011dec575a4066eb7e50c0bff2bad78a9cadd449
Author: Robert Bradshaw 
Date:   2017-04-21T16:59:52Z

Add instructions to regenerate Python proto wrappers.

commit 4a74cdd528c6e9800c66152f9498592f51b59248
Author: Robert Bradshaw 
Date:   2017-04-21T17:00:20Z

Generate python proto wrappers for runner and fn API.

commit 5dcd25d144f784a5b9bd6bd668c2ceb8bc2680f3
Author: Robert Bradshaw 
Date:   2017-04-21T18:04:03Z

Ignore generated files for linter.

commit 5697ecdca1c14ce926f0be63399df77609c12aaf
Author: Robert Bradshaw 
Date:   2017-04-21T18:09:03Z

Ignore generated files in rat plugin.

commit 57a16aa810f5e4ef14c5f2ced901c3bb6fbc7b1a
Author: Robert Bradshaw 
Date:   2017-04-21T19:15:02Z

Add apache licence to generated files.

commit a8456be3830869e5b51e24970655819bc832bd79
Author: Robert Bradshaw 
Date:   2017-04-21T19:36:27Z

implement portpicker, fix more imports

commit 

[jira] [Commented] (BEAM-115) Beam Runner API

2017-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967809#comment-15967809
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2505


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966294#comment-15966294
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2511


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965150#comment-15965150
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2505

[BEAM-115] Represent a Pipeline via a list of Top-level Transforms

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---
The root node is a synthetic transform which does not appear within the
graph, as it never has any components of note. Instead of referring to a
single "root node" in the Pipeline message, refer to the top-level nodes
which do not have an enclosing PTransform.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam top_level_transforms

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2505.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2505


commit 31d5c272c4a6822ef35836be207224b557e122b4
Author: Thomas Groh 
Date:   2017-04-11T23:42:28Z

Represent a Pipeline via a list of Top-level Transforms

The root node is a synthetic transform which does not appear within the
graph, as it never has any components of note. Instead of referring to a
single "root node" in the Pipeline message, refer to the top-level nodes
which do not have an enclosing PTransform.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959537#comment-15959537
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2452


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959305#comment-15959305
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2452

[BEAM-115] Rename FunctionSpec and UrnWithParameter to their (hopefully) 
final names



Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.pdf).

---

R: @tgroh 

I believe this is consistent with dev discussions and in PRs, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam runner-api-touch-ups

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2452






> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951270#comment-15951270
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2372


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949497#comment-15949497
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2372

[BEAM-115] Include the creating PCollection in PCollectionView

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
This is available on the client that created the view, but may not be
available elsewhere.

Update signatures and callers to match.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam view_contains_pcollection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2372.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2372


commit e0887a011484368139700259eb5af0363c5cbc4d
Author: Thomas Groh 
Date:   2017-03-30T17:47:12Z

Include the creating PCollection in PCollectionView

This is available on the client that created the view, but may not be
available elsewhere.

Update signatures and callers to match.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939334#comment-15939334
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/beam/pull/2309

[BEAM-115] Make Canonical ViewFns Public

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Mark all of the ViewFns as both deprecated and experimental. These fns
will all be removed once Runners have multimap support, and are not
suitable for users to use explicitly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/beam public_view_fns

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2309


commit b3ea5b22f090958f470fb2a9996ea96d94de2a3c
Author: Thomas Groh 
Date:   2017-03-23T22:23:40Z

Make Canonical ViewFns Public

Mark all of the ViewFns as both deprecated and experimental. These fns
will all be removed once Runners have multimap support, and are not
suitable for users to use explicitly.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937655#comment-15937655
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2299

[BEAM-115] Make distinguished URNs public

These URNs are in flux and will be relocated to some final good location as 
the
Runner API and Fn API develop.  For now, this change just makes them public 
in
the place where they currently are defined.

R: @tgroh 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam custom-windowfn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2299


commit 94814e88f6d6737f389c7b8e8727d3a4a78bba54
Author: Kenneth Knowles 
Date:   2017-03-23T03:33:24Z

Make distinguished URNs public

These URNs are in flux and will be relocated to some final good location as 
the
Runner API and Fn API develop.  For now, this change just makes them public 
in
the place where they currently are defined.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937428#comment-15937428
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/2296

[BEAM-115] Translate pipeline graph to and from Runner API protos.

There are some caveates:
  * Specific known transforms, with their payloads, are not yet translated.
  * Side inputs are not yet supported.

All pipelines without side inputs are passed through this translation by 
default
before execution.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam py-runner-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2296


commit ebc74e9678050c632574b11afebf250b8332c25e
Author: Robert Bradshaw 
Date:   2017-03-20T23:08:37Z

Translate pipeline graph to and from Runner API protos.

There are some caveates:
  * Specific known transforms, with their payloads, are not yet translated.
  * Side inputs are not yet supported.

All pipelines without side inputs are passed through this translation by 
default
before execution.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904419#comment-15904419
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2190


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890558#comment-15890558
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2131


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889140#comment-15889140
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/2131

[BEAM-115] Inline rather than reference FunctionSpecs.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-protos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2131


commit cbfce74aae64f9b353c07ced8427dc77ab1c31f4
Author: Robert Bradshaw 
Date:   2017-02-28T23:51:24Z

Inline rather than reference FunctionSpecs.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887038#comment-15887038
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2106


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883785#comment-15883785
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/2106

[BEAM-115] More Runner API refinements.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-protos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2106


commit c2f5351e3731f29d024240016c553cecd3be3143
Author: Robert Bradshaw 
Date:   2017-02-24T23:53:39Z

Make access_pattern a URN with params.




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883715#comment-15883715
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2094


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882880#comment-15882880
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2042


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881797#comment-15881797
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2094

[BEAM-115] Concretize generic bits of the Runner API graph structure

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @dhalperi @robertwb 

This gets rid of the excessive generic design of `GraphNode` and restores 
it to the original design wherein each node is a `PTransform`. I have also 
merged the `bytes` of the SDK-specific data and the `Any` that is 
SDK-independent data, since as has been pointed out we won't need both.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam inline-runner-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2094.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2094


commit cc46d194cacbfb2244fde837f01b2ba0f2cedcdb
Author: Kenneth Knowles 
Date:   2017-02-24T01:51:10Z

Inline PTransform to GraphNode, removing generic design

The GraphNode structure was made more generic to allow the Runner API
and Fn API to share the graph data structure while carrying distinct
payloads on nodes and edges. It seems that the Runner API was already
sufficiently flexible for the Fn API to use its existing payload
design.

commit 47289b5bc3a452e2866fb6515b55c7ef5d2835a8
Author: Kenneth Knowles 
Date:   2017-02-24T02:06:56Z

Condense FunctionSpec, merging data and params




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881275#comment-15881275
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/beam/pull/2065


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877407#comment-15877407
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/beam/pull/2011


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876299#comment-15876299
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/beam/pull/662


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872932#comment-15872932
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2042

[BEAM-115,BEAM-1195] Build trigger state machine from Runner API Trigger 
proto directly

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This replaces the case dispatch on the Java SDK's `Trigger` type with a 
case dispatch on the Runner API's `Trigger` type. This is a step towards 
`runners/core-java` executing triggers from any language.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam trigger-proto-state-machines

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2042.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2042


commit 7e4e37d0d2cb2fca1c12f67dedbf7b96cce3c6e5
Author: Kenneth Knowles 
Date:   2017-02-18T00:05:13Z

Build trigger state machine from Runner API Trigger proto directly




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871158#comment-15871158
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2030

[BEAM-115,BEAM-1328] Convert to/from WindowingStrategy proto in Java SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Some aspects of this PR are sort of hacks, but the kind that might be 
forgiven in order to make rapid progress. I am opening the PR for comment 
anyhow, but there are  few changes that I might now make underneath this PR:

1. Move Java SDK `ClosingBehavior` to top level and rename. Incidentally 
related to [BEAM-210](https://issues.apache.org/jira/browse/BEAM-210) only 
because that is another reason this should just be a top-level concept. Not 
sure there's much benefit to putting public enums inside other misc classes 
when there's no real natural home for them.
2. Move Java SDK `AccumulationMode` to top level and put its to/from proto 
there. In particular, it should also come out of `util`.
3. Maybe actually try to move from `OutputTimeFn` to a Java SDK 
`OutputTime` prior to this PR. But actually converting this to proto, then 
converting the runners to use that will make the latter migration easier. Some 
cruft is introduced in the meantime, though.
4. If `WindowingStrategy` is going to continue to be a thing that is 
prominent all over our public SDK surface and in the runner API whether it 
really belongs in `util`.

And I'll want to flesh out the list of test cases. Given how generic the 
logic is, I don't expect many surprises.

R: @tgroh 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam WindowingStrategy-from-proto

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2030


commit e7171b8fcd8068e5b0ec0c660d16d104bb1c334a
Author: Kenneth Knowles 
Date:   2017-02-16T22:45:05Z

Make SDK-specific serialized blob really a blob

commit b3b3ba5cdd5559685208b0cbbd45071bc862a7b8
Author: Kenneth Knowles 
Date:   2017-02-17T04:26:39Z

Add closing behavior to Runner API proto

commit fd995928091582487e14cb1af128354b5c9fadbe
Author: Kenneth Knowles 
Date:   2017-02-17T04:26:45Z

Add conversion to/from Runner API proto for WindowingStrategy




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870858#comment-15870858
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/2023


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867257#comment-15867257
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user kennknowles closed the pull request at:

https://github.com/apache/beam/pull/2000


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867182#comment-15867182
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/2011

[BEAM-115,BEAM-1348] Unify Fn API and Runner API coder specs

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This includes the entirety of #2000, which unified `FunctionSpec`. Feel 
free to review that first or, if you prefer, review just this since the diff is 
small.

The unification here was uninteresting on top of #2000. Summary of changes:

 - Moved a coder's local id out of the `Coder` itself and into the key of a 
map in `ProcessBundleDescriptor`. Philosophically, the id is an essential 
aspect of `ProcessBundleDescriptor` (or `Pipeline`) but not not an essential 
aspect of a coder.  Pragmatically, this allows the Runner API and the Fn API to 
key the map on different types (`string` and `int64` respectively). 
Prospectively, it makes it easy to construct instances of the message that are 
"just values" without any id, which is aesthetically pleasing and more flexible 
to more uses.
 - Inlined the `SdkFunctionSpec` in `Coder` in the Runner API. Having it by 
reference introduces a needless sharing of key type and adds needless overhead, 
since coders are already stored by reference, as are environments.

R: @dhalperi 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam fn-api-coders

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/2011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2011


commit 2b55a7f303ab0fea58ad279dd214253b4fe69565
Author: Kenneth Knowles 
Date:   2017-02-14T20:33:43Z

Remove underscore from Runner API proto Java package

commit 4e7865b828eae962532f1759833eed8b0e769cc9
Author: Kenneth Knowles 
Date:   2017-02-13T16:38:40Z

Unify Fn API and Runner API FunctionSpec

commit 5b5e6290e893385c47799cf5523c29be64c102fd
Author: Kenneth Knowles 
Date:   2017-02-15T03:51:58Z

Unify Fn API and Runner API coder spec




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860380#comment-15860380
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/1946


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-115) Beam Runner API

2017-02-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857474#comment-15857474
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/beam/pull/1946

[BEAM-115] Add proto definition for Runner API

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

These are the protocol buffers definitions corresponding to the Avro schema 
proposed in https://s.apache.org/beam-runner-api.

Differences from the schema there:

 - Graph structure is decoupled from what data annotations the nodes
 - Adopted names from the Fn API's proto for things that overlap
 - Added explicit URNs and payloads for primitives
 - Added outline of state and timers
 - Added Environment URLs

I would like to merge this and separately port the Fn API to use shared 
structures as appropriate, since that may involve more extensive coding work.

Unlike #662, this is _not_ a WIP just for discussion. This is intended for 
immediate use as the serialization format for various pieces of the pipeline to 
unblock work on the Fn API.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/beam pipeline-proto

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/1946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1946


commit 7bdd2869053bd3af7afdf68e36e1d54f868419ae
Author: Kenneth Knowles 
Date:   2017-02-07T23:25:32Z

Add proto definition for Runner API




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)