[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323753#comment-15323753 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/392 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323189#comment-15323189 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/435 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321326#comment-15321326 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/435 [BEAM-22] Shutdown the InProcessPipelineRunner after Terminating Abnormally Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Ensure that the executor service is shutdown, and the monitor is not rescheduled, after an exception is thrown. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam shutdown_executor_after_exception Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/435.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #435 commit 591134749f2d24a7d6550e0bb00845a36cdb1616 Author: Thomas GrohDate: 2016-06-08T19:00:05Z Shutdown the InProcessPipelineRunner after Terminating Abnormally Ensure that the executor service is shutdown, and the monitor is not rescheduled, after an exception is thrown. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315206#comment-15315206 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/418 [BEAM-22] Remove improper Fn cloning Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- The ParDoInProcessEvaluator is provided clones of a DoFn when appropriate, and should not serialize them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam remove_improper_cloning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #418 commit aa154e4ccf2e646069f80325ac8ec5f65a621382 Author: Thomas GrohDate: 2016-06-04T00:57:24Z Remove improper Fn cloning The ParDoInProcessEvaluator is provided clones of a DoFn when appropriate, and should not serialize them. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301256#comment-15301256 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/392 [BEAM-22][BEAM-243] Execute NeedsRunner tests in the DirectRunner module Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This ensures that all tests that depend on a pipeline are executed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam needs_runner_test_executions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/392.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #392 commit b99e6ccd6c45c38ef8e14befb627dec71286b69e Author: Thomas GrohDate: 2016-05-20T22:18:11Z Execute NeedsRunner tests in the Direct Runner commit 89ca5bb20394541dfbf0b1bbacef97f614657987 Author: Thomas Groh Date: 2016-05-26T00:45:26Z Declare Dependencies on the Direct Runner Changing the default Runner requires a runner to be available for existing modules. Declare a dependency as appropriate. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282308#comment-15282308 ] Thomas Groh commented on BEAM-22: - As of the merging of https://github.com/apache/incubator-beam/pull/319, the implementation is complete, pending the swapping of the defaults, removal of the legacy DirectPipelineRunner, and rename. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280718#comment-15280718 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/322 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279326#comment-15279326 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/322 [BEAM-22] More regularly schedule additional roots Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This ensures that even when elements are pushed back into the Pipeline Runner, roots are scheduled if necessary. As elements may be rescheduled indefinitely, this is required to ensure that unbounded roots are scheduled during pipeline execution when existing elements are blocked on side inputs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam more_aggressively_add_roots Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/322.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #322 commit 1f5e399630e3ba71e2c025e7ab3f32c6c3ba9518 Author: Thomas GrohDate: 2016-05-11T00:11:23Z More regularly schedule additional roots This ensures that even when elements are pushed back into the Pipeline Runner, roots are scheduled if necessary. As elements may be rescheduled indefinitely, this is required to ensure that unbounded roots are scheduled during pipeline execution when existing elements are blocked on side inputs. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279008#comment-15279008 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/320 [BEAM-22] Reuse DoFns in ParDoEvaluators Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This allows the runner to avoid cloning DoFns for every input bundle. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam reuse_dofns_pardoevaluators Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/320.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #320 commit f018c75ef3ca1b60fc990bd88031d5419c571b87 Author: Thomas GrohDate: 2016-05-10T21:06:27Z Reuse DoFns in ParDoEvaluators This allows the runner to avoid cloning DoFns for every input bundle. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278813#comment-15278813 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/318 [BEAM-22] Use an AtomicReference in InProcessSideInputContainer Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This fixes a TOCTOU race in the contents updating logic, where the determination that the current pane should replace the contents of the side input and the replacement is not a single atomic operation. Using AtomicReference allows the use of compareAndSet to ensure that the replacement can only occur on the pane that the decision to replace was made with. Fixes a race where a pane could be the latest, and replace a pane, but would be lost due to an earlier pane being written between the invalidation and loading of contents. Fixes a race where a reader can incorrectly read an empty iterable as the contents of a PCollectionView, due to occuring between the invalidate and reload steps. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam atomic_reference_side_input_container Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/318.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #318 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278844#comment-15278844 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/319 [BEAM-22] Enable RunnableOnService Tests Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Not ready for review. Publishing PR to hook into Travis and Jenkins. Update runners/direct-java/pom.xml to enable the RunnableOnService tests phase. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam enable_ros_tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #319 commit d1796ba6fecb8423e563fcdf66946beda79e52c6 Author: Thomas GrohDate: 2016-05-09T22:47:27Z Minor checkArgument style fix commit f0e38fd170949f27d4794113e4bcb2077ffe88a6 Author: Thomas Groh Date: 2016-05-10T18:27:37Z Use an AtomicReference in InProcessSideInputContainer This fixes a TOCTOU race in the contents updating logic, where the determination that the current pane should replace the contents of the side input and the replacement is not a single atomic operation. Using AtomicReference allows the use of compareAndSet to ensure that the replacement can only occur on the pane that the decision to replace was made with. Fixes a race where a pane could be the latest, and replace a pane, but would be lost due to an earlier pane being written between the invalidation and loading of contents. Fixes a race where a reader can incorrectly read an empty iterable as the contents of a PCollectionView, due to occuring between the invalidate and reload steps. commit e06e449e3762a48404d0407babaff440ebfa416e Author: Thomas Groh Date: 2016-05-10T20:22:20Z Cache read SideInput Contents in the InProcessSideInputContainer This ensures that while processing a bundle all elements see the same contents for any SideInput Window. commit 8ff1d79474f3d114381b924fa61aa46bd7b935db Author: Thomas Groh Date: 2016-05-10T20:36:21Z Enable RunnableOnService tests for the Direct Runner > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278646#comment-15278646 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/312 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278513#comment-15278513 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/309 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277296#comment-15277296 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/289 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277134#comment-15277134 ] ASF GitHub Bot commented on BEAM-22: Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/304 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277124#comment-15277124 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/312 [BEAM-22] Update Watermarks Outside of handleResult Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This removes excess mutual exclusion, and reduces the eagerness of updating the value of a watermark. The first commit in this CR reviewed separately in #310 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_wm_asynchronous Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/312.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #312 commit 2d94540ccd02b19a73410647e84486c008d9cdc5 Author: Thomas GrohDate: 2016-05-09T21:03:16Z Return null evaluators from Unavailable Reads Null TransformEvaluators for sources represent a source where all splits are currently in use or completed. Update TransformExecutor to handle null evaluators properly. Change TransformExecutor to a Runnable. commit 9ee1502e159cce49a8a4490593d896b34154e006 Author: Thomas Groh Date: 2016-05-03T00:26:47Z Update Watermarks Outside of handleResult Remove excess mutual exclusion > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276999#comment-15276999 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/309 [BEAM-22] Use PushbackSideInputDoFnRunner in the InProcessRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This removes blocking behavior while retrieving Side Inputs in the InProcessRunner. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam use_pushback_runner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/309.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #309 commit ffab4b6b22276bf41c9ce4b33d22c44e6d753268 Author: Thomas GrohDate: 2016-04-28T00:27:57Z Use PushbackDoFnRunner in the ParDoInProcessEvaluator This ensures that the evaluator does not block while processing an input bundle. commit b7d09e14ef749fcd8e773c82b3f9e73f1646eff8 Author: Thomas Groh Date: 2016-04-28T17:12:09Z Limit the number of work schedules per MonitorRunnable run This ensures that work readded to the queue will not cause the monitor runnable to run forever before delivering timers > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276802#comment-15276802 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/283 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272951#comment-15272951 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/282 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271716#comment-15271716 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/289 [BEAM-22] Mark CheckpointMark as volatile in UnboundedReadEvaluator Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- The evaluator may be reused in a different thread, and updates to the checkpoint must be visible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam unbounded_evaluator_close_before_requeue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #289 commit e6ba15d451c9e3574a29ac0d51e6275893c9ee60 Author: Thomas GrohDate: 2016-05-05T00:44:59Z Mark CheckpointMark as volatile in UnboundedReadEvaluator The evaluator may be reused in a different thread, and updates to the checkpoint must be visible. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271078#comment-15271078 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/258 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269636#comment-15269636 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/282 [BEAM-22] Refactor CompletionCallbacks Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- The default and timerful completion callbacks are identical, excepting their calls to evaluationContext.commitResult; factor that code into a common location. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam completion_callback_refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/282.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #282 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267997#comment-15267997 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/265 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267190#comment-15267190 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/264 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264449#comment-15264449 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/257 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263344#comment-15263344 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/260 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263342#comment-15263342 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/261 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263336#comment-15263336 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/264 [BEAM-22] Correct sequence of pending element updates in InMemoryWatermarkManager Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Adding additional pending elements/timers (and thus holds) always comes before removing existing holds, so the watermark is never observed under more stringent restrictions than currently exist. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_wm_pending_sequence Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/264.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #264 commit 1340e826631e83835e4d00c29a339b15931b7a94 Author: Thomas GrohDate: 2016-04-29T00:38:50Z Correct sequence of pending element updates in InMemoryWatermarkManager Adding additional pending elements/timers (and thus holds) always comes before removing existing holds, so the watermark is never observed under more stringent restrictions than currently exist. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263163#comment-15263163 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/261 [BEAM-22] Stop cloning coders in Enforcements Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This is excessively slow and also not useful, as coders are required to be thread-safe You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_stop_cloning_coders Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #261 commit 2f2584c2d82d8739e27b3d99125bd5ee2a3444fc Author: Thomas GrohDate: 2016-04-28T22:27:03Z Stop cloning coders in Enforcements This is excessively slow and also not useful, as coders are required to be thread-safe > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262824#comment-15262824 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/260 [BEAM-22] Add CommittedResult Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Return as the output to InProcessEvaluationContext#handleResult(). This allows a richer return type to improve possible behaviors when a result is returned. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_committed_result_as_handle_result Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #260 commit 13da09d9f1b4a6498cfde8067d31e96f5817ca74 Author: Thomas GrohDate: 2016-04-28T19:22:47Z Add CommittedResult Return as the output to InProcessEvaluationContext#handleResult(). This allows a richer return type to improve possible behaviors when a result is returned. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261294#comment-15261294 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/258 [BEAM-22] Add PushbackDoFnRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This DoFnRunner wraps an existing DoFnRunner and provides a method, processElementInReadyWindows, which returns a list of WindowedValues that could not be processed due to requiring a SideInput that is not ready. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_pushback_do_fn_runner_impl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/258.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #258 commit 6110210d2c933ad87f165d8e1196718c80cb6794 Author: Thomas GrohDate: 2016-04-28T00:26:13Z Add PushbackDoFnRunner This DoFnRunner wraps an existing DoFnRunner and provides a method, processElementInReadyWindows, which returns a list of WindowedValues that could not be processed due to requiring a SideInput that is not ready. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261260#comment-15261260 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/257 [BEAM-22] Remove redundant close in BoundedReadEvaluatorFactory Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- The reader is already closed by virtue of being the target of the try-with-resources block that encompasses all of #finishBundle(). Calling close on the reader twice causes KafkaIO to throw an exception. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam remove_redundant_bounded_reader_close Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #257 commit 08c05e01b6248de853e8bc3d8446ed98d3408a6e Author: Thomas GrohDate: 2016-04-28T00:03:44Z Remove redundant close in BoundedReadEvaluatorFactory The reader is already closed by virtue of being the target of the try-with-resources block that encompasses all of #finishBundle(). > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260998#comment-15260998 ] ASF GitHub Bot commented on BEAM-22: Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/254 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260893#comment-15260893 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/220 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260534#comment-15260534 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/231 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257080#comment-15257080 ] ASF GitHub Bot commented on BEAM-22: Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/202 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254263#comment-15254263 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/231 [BEAM-22] Add withElements to CommittedBundle Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- When a bundle is partially completed, the unprocessed elements must be placed in a new bundle to be processed at a later time. The bundle in which they are processed should also have identical properties to the bundle which the elements were initially present in. withElements provides a simple way to create a "copy" of a bundle that contains different elements. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_bundle_withelements Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #231 commit b1369331a54036e3ce3ea60fc85b1a703387fd8c Author: Thomas GrohDate: 2016-04-22T17:02:05Z Add withElements to CommittedBundle When a bundle is partially completed, the unprocessed elements must be placed in a new bundle to be processed at a later time. The bundle in which they are processed should also have identical properties to the bundle which the elements were initially present in. withElements provides a simple way to create a "copy" of a bundle that contains different elements. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252842#comment-15252842 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/227 [BEAM-22] Wrap Exceptions thrown in StartBundle in the InProcessPipelineRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_wrap_start_bundle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/227.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #227 commit f04ed2dabfeceb606b3ed4496f770b79db2fe2fb Author: Thomas GrohDate: 2016-04-21T17:51:20Z Wrap Exceptions thrown in StartBundle in the InProcessPipelineRunner > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250768#comment-15250768 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/220 [BEAM-22] Allow InProcess Evaluators to check Side Input completion Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This checks to ensure that the PCollectionView in the SideInputWindow for the provided window either has elements available or is empty. Schedule a future to ensure that the SideInputWindows are appropriately filled with an empty iterable after retreiving the element. This is allows the ParDoEvaluator to not attempt to process elements that cannot currently be completed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_side_input_is_ready Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #220 commit ffc47d7b1367659fba36a17e477a510adb1d54f7 Author: Thomas GrohDate: 2016-04-18T23:16:48Z Allow InProcess Evaluators to check Side Input completion This checks to ensure that the PCollectionView in the SideInputWindow for the provided window either has elements available or is empty. Schedule a future to ensure that the SideInputWindows are appropriately filled with an empty iterable after retreiving the element. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250442#comment-15250442 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/201 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250092#comment-15250092 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/207 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248040#comment-15248040 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/203 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246889#comment-15246889 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/207 [BEAM-22] Track Pending Elements via Exploded WindowedValues Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This allows the WindowedValues that are completed to be removed from the set of pending elements, even if the actual object is a different instance, by ensuring that all WindowedValues contain only a single (element, window) pair. Built on top of #206 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_exploded_wm_tracking Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/207.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #207 commit ac0696d8867d7a1706f1ff85c4b299c2b1779d02 Author: Thomas GrohDate: 2016-04-18T23:55:57Z Add WindowedValue#explodeWindows This takes an existing WindowedValue and returns a Collection of WindowedValues, each of which is in exactly one window. Use the explode implementation on DoFnRunnerBase commit 725b2ddea58add3f583ed6f7c74f6ab4343cf292 Author: Thomas Groh Date: 2016-04-19T00:28:47Z Track pending elements via exploded WindowedValues This allows the WindowedValues to be partially completed while the still holding the watermark. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246791#comment-15246791 ] ASF GitHub Bot commented on BEAM-22: Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/204 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246583#comment-15246583 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh reopened a pull request: https://github.com/apache/incubator-beam/pull/202 [BEAM-22] Track Synchronized Processing Time Holds per-element Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This removes the concept of a Bundle in watermark tracking. Bundles are still used to start & finish work, but are not referenced within the actual Watermark objects. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_synchronized_processing_holds_per_element Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/202.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #202 commit beb1aab000d9f0edd1d60fe42b2c0ffab0375634 Author: Thomas GrohDate: 2016-04-18T20:33:26Z Track Synchronized Processing Time Holds per-element This removes the concept of a Bundle in watermark tracking. Bundles are still used to start/finish work, but watermarks are held on a per-element basis, which allows partial completion of input. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246581#comment-15246581 ] ASF GitHub Bot commented on BEAM-22: Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/202 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246471#comment-15246471 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/201 [BEAM-22] Remove isKeyed property of InProcess Bundles Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- The property of keyedness belongs to a PCollection. A BundleFactory propogates the key as far as possible, but does not track if a bundle is keyed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_remove_bundle_iskeyed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/201.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #201 commit 9644b7e20955603cb191f79f16b0f1f50b6497db Author: Thomas GrohDate: 2016-04-18T19:59:24Z Remove isKeyed property of InProcess Bundles The property of keyedness belongs to a PCollection. A BundleFactory propogates the key as far as possible, but does not track if a bundle is keyed. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243888#comment-15243888 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/178 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243267#comment-15243267 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/188 [BEAM-22] Improve ParDoEvaluator Factoring Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This moves shared code into a common location. Clone DoFn instances before constructing the DoFnRunner to avoid races. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_better_ParDoEvaluator_factoring Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/188.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #188 commit ecc26d51cee4ea1568948d48cd3441594f638e39 Author: Thomas GrohDate: 2016-03-30T00:38:22Z Move Shared construction code to ParDoInProcessEvaluator Remove duplicate code in ParDo(Single/Multi)EvaluatorFactory; instead only extract the appropriate elements and pass them to the ParDoInProcessEvaluator.t log commit e47aba0a7097cee8341369594e47e73b83029a50 Author: Thomas Groh Date: 2016-04-15T17:23:15Z Clone DoFns before constructing a DoFnRunner This ensures that each thread gets an individual copy of a DoFn, so multiple threads do not interact. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241712#comment-15241712 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/178 [BEAM-22] Switch the Default PipelineRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Use the InProcessPiplineRunner (pending rename) as the default runner. The InProcessPipelineRunner implements the beam model, including support for Unbounded PCollections. Tests will fail until #167 and #177 are merged, as this change modifies the graph constructed by the Pipeline if the runner is not specified. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_switch_default Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #178 commit 55ac23eb9bc184c5eda7c69ba5b70e436b665e09 Author: Thomas GrohDate: 2016-04-08T17:20:56Z Switch the Default PipelineRunner Use the InProcessPiplineRunner (pending rename) as the default runner. The InProcessPipelineRunner implements the beam model, including support for Unbounded PCollections. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237661#comment-15237661 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/169 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236337#comment-15236337 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/169 [BEAM-22] Add InProcessRegistrar Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace "" in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Adds @AutoService for the InProcessPipelineRunner and InProcessPipelineOptions You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam in_process_registrar Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #169 commit 8ba2da6387960bc29ca001e7ee06e3be7b771e03 Author: Thomas GrohDate: 2016-04-08T17:03:10Z Add InProcessRegistrar Adds @AutoService for the InProcessPipelineRunner and InProcessPipelineOptions > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235904#comment-15235904 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/148 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235880#comment-15235880 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/153 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235875#comment-15235875 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/161 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235377#comment-15235377 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/161 [BEAM-22] Fix multithreaded visibility issue with TransformExecutor Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace "" in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Use an AtomicReference to ensure the thread the Executor is being run on will be visible to other threads. Add a checkState to ensure that the executor will not run more than once. Stop setting the active thread to null, as the Transform Executor should be discarded from any refernces and not hold the overall execution. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam data_race_ippr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/161.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #161 commit 33556f2a94924effae56de119444e29eeaf8e498 Author: Thomas GrohDate: 2016-04-11T15:54:58Z Fix multithreaded visibility issue with TransformExecutor Use an AtomicReference to ensure the thread the Executor is being run on will be visible to other threads. Add a checkState to ensure that the executor will not run more than once. Stop setting the active thread to null, as the Transform Executor should be discarded from any refernces and not hold the overall execution. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232974#comment-15232974 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/143 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232676#comment-15232676 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/149 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232529#comment-15232529 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/148 [BEAM-22] Add ShardControlledWrite override Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace "" in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This is used for TextIO and AvroIO, which provide withNumOutputShards methods to control the number of output files. Apply this override in the InProcessPipelineRunner. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_avro_text_sharding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/148.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #148 commit d0697ecc61da1cc9ddc783d46fa9e42068ae1925 Author: Thomas GrohDate: 2016-03-29T17:06:02Z Add ShardControlledWrite override This is used for TextIO and AvroIO, which provide withNumOutputShards methods to control the number of output files. Apply this override in the InProcessPipelineRunner. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232523#comment-15232523 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/131 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230810#comment-15230810 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/143 [BEAM-22] Clear Empty TransformExecutorServices Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace "" in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This reduces the amount of objects tracked by the ExecutorServiceParallelExecutor when keys stop appearing. If this is not done, as new keys show up and no more work appears for old keys, the number of TransformExecutorServices grows, and we maintain a reference for all of them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_empty_evaluation_state Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/143.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #143 commit 066ae4526c229f1e9c11172aae82aaccbde09c42 Author: Thomas GrohDate: 2016-03-29T17:27:37Z Schedule all pending work before firing timers Pull all available work off of the ExecutorUpdate queue during each execution of the MonitorRunnable. commit c3ae8460b0f49bcb5d6d5e47d29152ccc411bf99 Author: Thomas Groh Date: 2016-03-29T17:56:10Z Clean up empty TransformExecutorServices This allows the size of the currentEvaluations map in ExecutorServiceParallelExecutor to be based on the number of keys for serial computations that are currently in-progress, rather than the number of keys for serializable computations for the entire life of the pipeline. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227225#comment-15227225 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/131 [BEAM-22] Add PTransformOverride class Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace "" in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This provides the #shouldOverride method to determine if a PTransform override should be applied, based on the original PTransform. This permits overrides that are applied conditionally (such as shard-controlled writes). For all existing overrides, this value is equal to true. This is used in (written, but unpublished) overrides for TextIO.Write and AvroIO.Write, so the write can be overridden in terms of sharded writes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_ptransform_override_class Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 commit 624ccb9359779161c2e5d6a79dc9b865aa22efb3 Author: Thomas GrohDate: 2016-03-29T17:03:25Z Add PTransformOverride class This provides the #shouldOverride method to determine if a PTransform override should be applied, based on the original PTransform. This permits overrides that are applied conditionally (such as shard-controlled writes). For all existing overrides, this value is equal to true. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227073#comment-15227073 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/129 [BEAM-22] Apply ModelEnforcement in the InProcessPipelineRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace "" in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This ensures that user code does not violate the model, as enforced by ModelEnforcements. Add a flag to control application of immutability enforcement. This flag is enabled by default. Also include minor touchups to ImmutabilityCheckingEnforcement. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_apply_enforcement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit db9ba651ff2532ca9ecf6bc1293dc5faad6b4103 Author: Thomas GrohDate: 2016-04-04T23:48:15Z Apply ModelEnforcement in the InProcessPipelineRunner This ensures that user code does not violate the model. Add a flag to control application of immutability enforcement. This flag is enabled by default. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225227#comment-15225227 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/86 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225168#comment-15225168 ] ASF GitHub Bot commented on BEAM-22: Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/112 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224697#comment-15224697 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/113 [BEAM-22] Give root transforms step names Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace "" in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Fix a bug where steps would only be given step names if they were a non-root node. Use the ConsumerTrackingPipelineVisitor in the InProcessEvaluationContext test to handle runner-expanded transforms You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam step_names_everywhere Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/113.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #113 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222035#comment-15222035 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/82 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221947#comment-15221947 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/106 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214814#comment-15214814 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/82 [BEAM-22] Execute ModelEnforcements in TransformExecutor This allows a configurable application of Model Enforcement based on the class of transform being executed, both before and after an element is processed and after the transform completes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_model_enforcing_transform_executor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/82.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #82 commit b76b9d62f2a21e534238437f0e6aa34321146007 Author: Thomas GrohDate: 2016-03-28T16:35:33Z Execute ModelEnforcements in TransformExecutor This allows a configurable application of Model Enforcement based on the class of transform being executed, both before and after an element is processed and after the transform completes. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207860#comment-15207860 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/62 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206602#comment-15206602 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/64 [BEAM-22] Schedule roots less aggressively The excess scheduling of known-empty bundles can consume excessive resources, especially with the default CachedThreadPool executor service. Removes an unnecessary synchronized block (the map is already thread-safe) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_less_aggression Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/64.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #64 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200967#comment-15200967 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/52 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202364#comment-15202364 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/62 [BEAM-22] Implement InProcessPipelineRunner#run Appropriately construct an evaluation context and executor, and start the pipeline when run is called. Implement InProcessPipelineResult. Apply PTransform overrides. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_runner_run Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/62.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #62 commit bb767860e13b8311c6cd090e5aeb9c323396638b Author: Thomas GrohDate: 2016-02-27T01:30:13Z Implement InProcessPipelineRunner#run Appropriately construct an evaluation context and executor, and start the pipeline when run is called. Implement InProcessPipelineResult. Apply PTransform overrides. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196288#comment-15196288 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/52 [BEAM-22] Close Readers in InProcess Read Evaluators The readers were formerly left open, which prevents release of any resources that should be released. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_close_sources Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/52.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #52 commit cb0b8112e3eec8c4ab79c33b63bea08d93e78ec0 Author: Thomas GrohDate: 2016-03-15T18:50:38Z Close Readers in InProcess Read Evaluators The readers were formerly left open, which prevents release of any resources that should be released. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195871#comment-15195871 ] ASF GitHub Bot commented on BEAM-22: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/11 > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-22) DirectPipelineRunner: support for unbounded collections
[ https://issues.apache.org/jira/browse/BEAM-22?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176604#comment-15176604 ] ASF GitHub Bot commented on BEAM-22: GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/11 [BEAM-22] Implement InProcessEvaluationContext This is the primary "global state" object for the evaluation of a Pipeline using the InProcessPipelineRunner, and is responsible for properly routing information about the state of the pipeline to transform evaluators. Remove the InProcessEvaluationContext from the InProcessPipelineRunner class, and implement as a class directly. Fix associated imports. Split from the first commit in #3 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam ippr_evaluation_context Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/11.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11 commit 497975272a8028cfcb2db9ffa973e86dff3f36d5 Author: Thomas GrohDate: 2016-02-27T01:28:37Z Implement InProcessEvaluationContext This is the primary "global state" object for the evaluation of a Pipeline using the InProcessPipelineRunner, and is responsible for properly routing information about the state of the pipeline to transform evaluators. Remove the InProcessEvaluationContext from the InProcessPipelineRunner class, and implement as a class directly. Fix associated imports. > DirectPipelineRunner: support for unbounded collections > --- > > Key: BEAM-22 > URL: https://issues.apache.org/jira/browse/BEAM-22 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Davor Bonaci >Assignee: Thomas Groh > > DirectPipelineRunner currently runs over bounded PCollections only, and > implements only a portion of the Beam Model. > We should improve it to faithfully implement the full Beam Model, such as add > ability to run over unbounded PCollections, and better resemble execution > model in a distributed system. > This further enables features such as a testing source which may simulate > late data and test triggers in the pipeline. Finally, we may want to expose > an option to select between "debug" (single threaded), "chaos monkey" (test > as many model requirements as possible), and "performance" (multi-threaded). > more testing (chaos monkey) > Once this is done, we should update this StackOverflow question: > http://stackoverflow.com/questions/35350113/testing-triggers-with-processing-time/35401426#35401426 -- This message was sent by Atlassian JIRA (v6.3.4#6332)