[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658789#comment-15658789 ] ASF GitHub Bot commented on BEAM-310: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1339 > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > Fix For: 0.3.0-incubating > > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655261#comment-15655261 ] ASF GitHub Bot commented on BEAM-310: - GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/1339 [BEAM-310] Actually Split Root Transforms Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Permit the ExecutorServiceParallelExecutor to control its own ExecutorService by passing only a TargetParallelism parameter. Split roots into the greater of 3 or the target parallelism. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam actually_split Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1339 commit b18b876de91bfc01c82cf10bf53eb27a5aef3b09 Author: Thomas GrohDate: 2016-11-10T21:47:40Z Actually Split Root Transforms Permit the ExecutorServiceParallelExecutor to control its own ExecutorService by passing only a TargetParallelism parameter. Split roots into the greater of 3 or the target parallelism. > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > Fix For: 0.3.0-incubating > > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576473#comment-15576473 ] ASF GitHub Bot commented on BEAM-310: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1063 > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553453#comment-15553453 ] ASF GitHub Bot commented on BEAM-310: - GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/1063 [BEAM-310] Perform initial splitting in the DirectRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This allows sources to be read from in parallel and generates initial splits. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam initial_splits Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1063.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1063 commit 891e7fc9af4fc7540269e3dab2941f8c64d4ec84 Author: Thomas GrohDate: 2016-10-05T23:11:21Z Perform initial splitting in the DirectRunner This allows sources to be read from in parallel and generates initial splits. > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536420#comment-15536420 ] ASF GitHub Bot commented on BEAM-310: - Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/996 > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534390#comment-15534390 ] ASF GitHub Bot commented on BEAM-310: - Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/1019 > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517886#comment-15517886 ] ASF GitHub Bot commented on BEAM-310: - GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/996 [BEAM-310] Add RootTransformEvaluatorFactory, Use for Reads Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This is an extension of TransformEvaluatorFactory that applies to transforms that can be the root transform of a Pipeline. They produce bundles which provide an impulse to the PTransforms that are at the root of the Pipeline. Add an ImpulseBundle implementation to represent this impulse as a bundle that is not part of a PCollection. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam root_initial_splits Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #996 commit 8670bf7995639a95a3f2c3205b063f9c64ff18b9 Author: Thomas GrohDate: 2016-09-23T15:52:47Z Add RootTransformEvaluatorFactory, Use for Reads This is an extension of TransformEvaluatorFactory that applies to transforms that can be the root transform of a Pipeline. They produce bundles which provide an impulse to the PTransforms that are at the root of the Pipeline. Add an ImpulseBundle implementation to represent this impulse as a bundle that is not part of a PCollection. > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-310) Exercise splitIntoBundles/generateInitialSplits in the Direct Runner
[ https://issues.apache.org/jira/browse/BEAM-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353334#comment-15353334 ] Daniel Halperin commented on BEAM-310: -- Note that we should also exercise splitAtFraction. > Exercise splitIntoBundles/generateInitialSplits in the Direct Runner > > > Key: BEAM-310 > URL: https://issues.apache.org/jira/browse/BEAM-310 > Project: Beam > Issue Type: Improvement > Components: runner-direct >Reporter: Thomas Groh >Assignee: Thomas Groh > > BoundedSource#splitIntoBundles and UnboundedSource#generateInitialSplits are > the methods by which sources can be accessed in parallel. Exercising these > methods allows reads (and all transforms downstream) to be executed in > parallel both pre and post a GroupByKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)