[jira] [Commented] (BEAM-25) Add user-ready API for interacting with state
[ https://issues.apache.org/jira/browse/BEAM-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731043#comment-15731043 ] ASF GitHub Bot commented on BEAM-25: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1543 > Add user-ready API for interacting with state > - > > Key: BEAM-25 > URL: https://issues.apache.org/jira/browse/BEAM-25 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: State > > Our current state API is targeted at runner implementers, not pipeline > authors. As such it has many capabilities that are not necessary nor > desirable for simple use cases of stateful ParDo (such as dynamic state tag > creation). Implement a simple state intended for user access. > (Details of our current thoughts in forthcoming design doc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers
[ https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730970#comment-15730970 ] ASF GitHub Bot commented on BEAM-27: Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1528 > Add user-ready API for interacting with timers > -- > > Key: BEAM-27 > URL: https://issues.apache.org/jira/browse/BEAM-27 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > > Pipeline authors will benefit from a different factorization of interaction > with underlying timers. The current APIs are targeted at runner implementers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1109) Python ValidatesRunner Tests on Dataflow Service Timeout
Mark Liu created BEAM-1109: -- Summary: Python ValidatesRunner Tests on Dataflow Service Timeout Key: BEAM-1109 URL: https://issues.apache.org/jira/browse/BEAM-1109 Project: Beam Issue Type: Bug Components: sdk-py, testing Reporter: Mark Liu Assignee: Mark Liu ValidatesRunner tests timeout with following logs: https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/839/console Need to increase "--process-timeout" in execution command (https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/run_postcommit.sh#L77). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1096) flink streaming side output optimization using SplitStream
[ https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated BEAM-1096: --- Assignee: Alexey Diomin > flink streaming side output optimization using SplitStream > -- > > Key: BEAM-1096 > URL: https://issues.apache.org/jira/browse/BEAM-1096 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Assignee: Alexey Diomin >Priority: Minor > Fix For: 0.4.0-incubating > > > Current implementation: > 1) send all events in all output streams > 2) filtering streams for necessary tags > Cons: increased cpu usage for serialization all events > Proposed implementation: > 1) route event in correct streams based on tag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1096) flink streaming side output optimization using SplitStream
[ https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730793#comment-15730793 ] ASF GitHub Bot commented on BEAM-1096: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1520 > flink streaming side output optimization using SplitStream > -- > > Key: BEAM-1096 > URL: https://issues.apache.org/jira/browse/BEAM-1096 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Priority: Minor > > Current implementation: > 1) send all events in all output streams > 2) filtering streams for necessary tags > Cons: increased cpu usage for serialization all events > Proposed implementation: > 1) route event in correct streams based on tag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1095) Add support set config for reuse-object on flink
[ https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated BEAM-1095: --- Assignee: Alexey Diomin > Add support set config for reuse-object on flink > > > Key: BEAM-1095 > URL: https://issues.apache.org/jira/browse/BEAM-1095 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Assignee: Alexey Diomin >Priority: Trivial > Fix For: 0.4.0-incubating > > > Object-reuse is dangerous setting and disabled by default, > but sometime we need use this option to omit performance overhead for > serialization-deserialization objects on every transformations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-1095) Add support set config for reuse-object on flink
[ https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed BEAM-1095. -- Resolution: Fixed Fix Version/s: 0.4.0-incubating > Add support set config for reuse-object on flink > > > Key: BEAM-1095 > URL: https://issues.apache.org/jira/browse/BEAM-1095 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Assignee: Alexey Diomin >Priority: Trivial > Fix For: 0.4.0-incubating > > > Object-reuse is dangerous setting and disabled by default, > but sometime we need use this option to omit performance overhead for > serialization-deserialization objects on every transformations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1095) Add support set config for reuse-object on flink
[ https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730767#comment-15730767 ] ASF GitHub Bot commented on BEAM-1095: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1518 > Add support set config for reuse-object on flink > > > Key: BEAM-1095 > URL: https://issues.apache.org/jira/browse/BEAM-1095 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Priority: Trivial > > Object-reuse is dangerous setting and disabled by default, > but sometime we need use this option to omit performance overhead for > serialization-deserialization objects on every transformations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation
[ https://issues.apache.org/jira/browse/BEAM-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730723#comment-15730723 ] ASF GitHub Bot commented on BEAM-1108: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1546 > Remove deprecated Dataflow Runner options and update documentation > -- > > Key: BEAM-1108 > URL: https://issues.apache.org/jira/browse/BEAM-1108 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: Not applicable >Reporter: Daniel Halperin >Assignee: Daniel Halperin >Priority: Minor > Fix For: Not applicable > > > Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} > configurations, plus improving documentation. Will update bug description as > more tasks arise. > 1. Remove the {{TEARDOWN_POLICY}} option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI
[ https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730375#comment-15730375 ] Aljoscha Krettek commented on BEAM-1107: Yep, you're right but even in the black text the operation names (MapPartition, GroupCombine and so on) are hardcoded in Flink right now so we cannot change that coming from Beam-on-Flink. Changing that would require changes to Flink (which I'm not opposed to). > Display user names for steps in the Flink Web UI > > > Key: BEAM-1107 > URL: https://issues.apache.org/jira/browse/BEAM-1107 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Halperin >Assignee: Aljoscha Krettek > Attachments: screenshot-1.png > > > [copying in-person / email discussion at Strata Singapore to JIRA] > The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the > "SDK name" for the transform. > The "user name" for the transform is not available here, it is in fact on the > TransformHierarchy.Node as node.getFullName() [2]. > getFullName() is used some in Flink, but not when setting step names. > I drafted a quick commit that sort of propagates the user names to the web UI > (but only for DataSource, and still too verbose: > https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c) > Before this change, the "ReadLines" step showed up as: "DataSource (at > Read(CompressedSource) > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > With this change, it shows up as "DataSource (at ReadLines/Read > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.]. > Thoughts? > [1] > https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129 > [2] > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation
[ https://issues.apache.org/jira/browse/BEAM-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730373#comment-15730373 ] Neelesh Srinivas Salian commented on BEAM-1108: --- [~dhalp...@google.com] subtasks would be good to have to spread the work if there are multiple parts to this.. > Remove deprecated Dataflow Runner options and update documentation > -- > > Key: BEAM-1108 > URL: https://issues.apache.org/jira/browse/BEAM-1108 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Affects Versions: Not applicable >Reporter: Daniel Halperin >Assignee: Daniel Halperin >Priority: Minor > Fix For: Not applicable > > > Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} > configurations, plus improving documentation. Will update bug description as > more tasks arise. > 1. Remove the {{TEARDOWN_POLICY}} option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-646) Get runners out of the apply()
[ https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730359#comment-15730359 ] ASF GitHub Bot commented on BEAM-646: - GitHub user tgroh opened a pull request: https://github.com/apache/incubator-beam/pull/1547 [BEAM-646] Add PTransformOverrideFactory to the Core SDK Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This migrates PTransformOverrideFactory from the DirectRunner to the Core SDK, as part of BEAM-646. Migrate all DirectRunner Override Factories to the new PTransformOverrideFactory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgroh/incubator-beam override_factory_in_core Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1547 commit ad7aa03e12694bceb29906d2bb9df1ce009a1df2 Author: Thomas Groh <tg...@google.com> Date: 2016-12-06T00:01:57Z Add PTransformOverrideFactory to the Core SDK This migrates PTransformOverrideFactory from the DirectRunner to the Core SDK, as part of BEAM-646. Add getOriginalToReplacements to provide a mapping from the original outputs to replaced outputs. This enables all replaced nodes to be rewired to output the original output. Migrate all DirectRunner Override Factories to the new PTransformOverrideFactory. > Get runners out of the apply() > -- > > Key: BEAM-646 > URL: https://issues.apache.org/jira/browse/BEAM-646 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Thomas Groh > > Right now, the runner intercepts calls to apply() and replaces transforms as > we go. This means that there is no "original" user graph. For portability and > misc architectural benefits, we would like to build the original graph first, > and have the runner override later. > Some runners already work in this manner, but we could integrate it more > smoothly, with more validation, via some handy APIs on e.g. the Pipeline > object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI
[ https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730313#comment-15730313 ] Daniel Halperin commented on BEAM-1107: --- Ack -- I guess I have this intuition there's opportunity for more cleanup, but I may be wrong (or it may be a Flink-general, not Beam-on-Flink issue). E.g., look at the attached screenshot: * The name (grey) at the top is MapPartition -> Map -> GroupCombine -> Map * The name of the steps (black) includes the identical as the grey, with additionally (step name) * The Operation: text (small, grey) at the bottom includes the same (almost - logical vs physical?) information, although there appears to be some HTML error with inserting a break tag inside another break tag. > Display user names for steps in the Flink Web UI > > > Key: BEAM-1107 > URL: https://issues.apache.org/jira/browse/BEAM-1107 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Halperin >Assignee: Aljoscha Krettek > Attachments: screenshot-1.png > > > [copying in-person / email discussion at Strata Singapore to JIRA] > The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the > "SDK name" for the transform. > The "user name" for the transform is not available here, it is in fact on the > TransformHierarchy.Node as node.getFullName() [2]. > getFullName() is used some in Flink, but not when setting step names. > I drafted a quick commit that sort of propagates the user names to the web UI > (but only for DataSource, and still too verbose: > https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c) > Before this change, the "ReadLines" step showed up as: "DataSource (at > Read(CompressedSource) > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > With this change, it shows up as "DataSource (at ReadLines/Read > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.]. > Thoughts? > [1] > https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129 > [2] > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1107) Display user names for steps in the Flink Web UI
[ https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Halperin updated BEAM-1107: -- Attachment: screenshot-1.png > Display user names for steps in the Flink Web UI > > > Key: BEAM-1107 > URL: https://issues.apache.org/jira/browse/BEAM-1107 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Halperin >Assignee: Aljoscha Krettek > Attachments: screenshot-1.png > > > [copying in-person / email discussion at Strata Singapore to JIRA] > The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the > "SDK name" for the transform. > The "user name" for the transform is not available here, it is in fact on the > TransformHierarchy.Node as node.getFullName() [2]. > getFullName() is used some in Flink, but not when setting step names. > I drafted a quick commit that sort of propagates the user names to the web UI > (but only for DataSource, and still too verbose: > https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c) > Before this change, the "ReadLines" step showed up as: "DataSource (at > Read(CompressedSource) > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > With this change, it shows up as "DataSource (at ReadLines/Read > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.]. > Thoughts? > [1] > https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129 > [2] > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730289#comment-15730289 ] ASF GitHub Bot commented on BEAM-551: - GitHub user sammcveety opened a pull request: https://github.com/apache/incubator-beam/pull/1545 [BEAM-551] Fix handling of TextIO.Sink R: @dhalperi Directory needs to be parameterized. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sammcveety/incubator-beam sgmc/text_io_write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1545.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1545 commit d017fde2d063765a73b290e1b1e1b849f147910f Author: Sam McVeety <s...@google.com> Date: 2016-12-07T21:27:53Z [BEAM-551] Fix handling of TextIO.Sink commit 9309c9389d9e9fa2cae3f7378692d0484ddc54b2 Author: Sam McVeety <s...@google.com> Date: 2016-12-07T22:09:41Z Fix test > Support Dynamic PipelineOptions > --- > > Key: BEAM-551 > URL: https://issues.apache.org/jira/browse/BEAM-551 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Sam McVeety >Assignee: Sam McVeety >Priority: Minor > > During the graph construction phase, the given SDK generates an initial > execution graph for the program. At execution time, this graph is > executed, either locally or by a service. Currently, Beam only supports > parameterization at graph construction time. Both Flink and Spark supply > functionality that allows a pre-compiled job to be run without SDK > interaction with updated runtime parameters. > In its current incarnation, Dataflow can read values of PipelineOptions at > job submission time, but this requires the presence of an SDK to properly > encode these values into the job. We would like to build a common layer > into the Beam model so that these dynamic options can be properly provided > to jobs. > Please see > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit > for the high-level model, and > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit > for > the specific API proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI
[ https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730212#comment-15730212 ] Daniel Halperin commented on BEAM-1107: --- Also copying [~aljoscha]'s response :) {quote} I think we can get it down to "Data Source (ReadLines/Read)" (and similarly for other operators). The problem is that the String parameter is not the correct way to set the name of the operator but some other (admittedly weird) thing called "location name". To set the name we have to call .name(String) on the created operator after creating it. {quote} > Display user names for steps in the Flink Web UI > > > Key: BEAM-1107 > URL: https://issues.apache.org/jira/browse/BEAM-1107 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Daniel Halperin >Assignee: Aljoscha Krettek > > [copying in-person / email discussion at Strata Singapore to JIRA] > The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the > "SDK name" for the transform. > The "user name" for the transform is not available here, it is in fact on the > TransformHierarchy.Node as node.getFullName() [2]. > getFullName() is used some in Flink, but not when setting step names. > I drafted a quick commit that sort of propagates the user names to the web UI > (but only for DataSource, and still too verbose: > https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c) > Before this change, the "ReadLines" step showed up as: "DataSource (at > Read(CompressedSource) > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > With this change, it shows up as "DataSource (at ReadLines/Read > (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" > which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.]. > Thoughts? > [1] > https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129 > [2] > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1107) Display user names for steps in the Flink Web UI
Daniel Halperin created BEAM-1107: - Summary: Display user names for steps in the Flink Web UI Key: BEAM-1107 URL: https://issues.apache.org/jira/browse/BEAM-1107 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Daniel Halperin Assignee: Aljoscha Krettek [copying in-person / email discussion at Strata Singapore to JIRA] The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the "SDK name" for the transform. The "user name" for the transform is not available here, it is in fact on the TransformHierarchy.Node as node.getFullName() [2]. getFullName() is used some in Flink, but not when setting step names. I drafted a quick commit that sort of propagates the user names to the web UI (but only for DataSource, and still too verbose: https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c) Before this change, the "ReadLines" step showed up as: "DataSource (at Read(CompressedSource) (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" With this change, it shows up as "DataSource (at ReadLines/Read (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))" which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.]. Thoughts? [1] https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129 [2] https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-905) Archetype pom needs to generalize dependencies
[ https://issues.apache.org/jira/browse/BEAM-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730204#comment-15730204 ] ASF GitHub Bot commented on BEAM-905: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1533 > Archetype pom needs to generalize dependencies > -- > > Key: BEAM-905 > URL: https://issues.apache.org/jira/browse/BEAM-905 > Project: Beam > Issue Type: Bug >Affects Versions: 0.4.0-incubating > Environment: Currently the archetype pom includes the direct runner > and the dataflow one, but not the others. It should do the same magic as the > main examples. >Reporter: Frances Perry >Assignee: Daniel Halperin > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-975) Issue with MongoDBIO
[ https://issues.apache.org/jira/browse/BEAM-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730202#comment-15730202 ] Reza Nouri commented on BEAM-975: - Hey [~jbonofre], Here is from mongo log before failure: 2016-12-08T09:42:36.346+1100 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2016-12-08T09:42:36.346+1100 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2016-12-08T09:42:37.743+1100 E STORAGE [thread1] WiredTiger (24) [1481150557:743579][1523:0x75b05000], log-server: data/db/journal: directory-list: opendir: Too many open files 2016-12-08T09:42:37.743+1100 E STORAGE [thread1] WiredTiger (24) [1481150557:743728][1523:0x75b05000], log-server: journal: directory-list, prefix "WiredTigerPreplog": Too many open files 2016-12-08T09:42:37.743+1100 E STORAGE [thread1] WiredTiger (24) [1481150557:743750][1523:0x75b05000], log-server: log pre-alloc server error: Too many open files 2016-12-08T09:42:37.743+1100 E STORAGE [thread1] WiredTiger (24) [1481150557:743768][1523:0x75b05000], log-server: log server error: Too many open files 2016-12-08T09:42:47.005+1100 W FTDC [ftdc] Uncaught exception in 'FileNotOpen: Failed to open interim file data/db/diagnostic.data/metrics.interim.temp' in full-time diagnostic data capture subsystem. Shutting down the full-time diagnostic data capture subsystem. 2016-12-08T09:43:27.758+1100 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2016-12-08T09:43:27.758+1100 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2016-12-08T09:43:28.635+1100 W NETWORK [HostnameCanonicalizationWorker] Failed to obtain address information for hostname dyn: nodename nor servname provided, or not known 2016-12-08T09:43:28.759+1100 I NETWORK [initandlisten] Listener: accept() returns -1 errno:24 Too many open files 2016-12-08T09:43:28.759+1100 E NETWORK [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections. 2016-12-08T09:43:29.021+1100 E STORAGE [thread2] WiredTiger (24) [1481150609:20956][1523:0x75c8e000], file:WiredTiger.wt, WT_SESSION.checkpoint: data/db/WiredTiger.turtle: handle-open: open: Too many open files 2016-12-08T09:43:29.021+1100 E STORAGE [thread2] WiredTiger (24) [1481150609:21326][1523:0x75c8e000], checkpoint-server: checkpoint server error: Too many open files 2016-12-08T09:43:29.021+1100 E STORAGE [thread2] WiredTiger (-31804) [1481150609:21355][1523:0x75c8e000], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic 2016-12-08T09:43:29.021+1100 I -[thread2] Fatal Assertion 28558 2016-12-08T09:43:29.021+1100 I -[thread2] ***aborting after fassert() failure 2016-12-08T09:43:29.029+1100 F -[thread2] Got signal: 6 (Abort trap: 6). And then it throws connection timeout exception: SEVERE: Servlet.service() for servlet [Curation] in context with path [] threw exception [org.apache.beam.sdk.Pipeline$PipelineExecutionException: com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]] with root cause com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}] at com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369) at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75) at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71) at com.mongodb.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:63) at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:89) at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:84) at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55) at com.mongodb.Mongo.execute(Mongo.java:772) at com.mongodb.Mongo$2.execute(
[jira] [Commented] (BEAM-25) Add user-ready API for interacting with state
[ https://issues.apache.org/jira/browse/BEAM-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730142#comment-15730142 ] ASF GitHub Bot commented on BEAM-25: GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1543 [BEAM-25] Move CopyOnAccessStateInternals to runners/direct Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- R: @tgroh this is only actually used by the direct runner. Not necessarily the greatest JIRA for this, but I'm not sure of a blanket for cleaning out the SDK's excessive surface area, so I went with the state ticket. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam CopyOnAccessStateInternals Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1543.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1543 commit 019612ba5e5a656e848d458617007d39be42b3e9 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T22:28:39Z Move CopyOnAccessStateInternals to runners/direct > Add user-ready API for interacting with state > - > > Key: BEAM-25 > URL: https://issues.apache.org/jira/browse/BEAM-25 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: State > > Our current state API is targeted at runner implementers, not pipeline > authors. As such it has many capabilities that are not necessary nor > desirable for simple use cases of stateful ParDo (such as dynamic state tag > creation). Implement a simple state intended for user access. > (Details of our current thoughts in forthcoming design doc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn
[ https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730125#comment-15730125 ] ASF GitHub Bot commented on BEAM-498: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1541 [BEAM-498] Move PerKeyCombineFnRunner to runners-core Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- R: @peihe seems there is no reason this needs to stay, right? `PerKeyCombineFnRunners` (the static util class) is already moved. R: @lukecwik this has some references to `OldDoFn` so at least this gets them out of the SDK. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam PerKeyCombineFnRunner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1541.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1541 commit 351f58567212e7a9d4664ca65a0bd4a10a77ed81 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T22:22:21Z Move PerKeyCombineFnRunner to runners-core > Make DoFnWithContext the new DoFn > - > > Key: BEAM-498 > URL: https://issues.apache.org/jira/browse/BEAM-498 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: backward-incompatible > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn
[ https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730120#comment-15730120 ] ASF GitHub Bot commented on BEAM-498: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1539 [BEAM-498] Remove misc occurrences of OldDoFn Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- R: @lukecwik here's a few more You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam remove-OldDoFn Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1539 commit 4c178c4dbfbdbcb5e5e3100a14807be7dfda62be Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T22:17:01Z Remove misc occurrences of OldDoFn > Make DoFnWithContext the new DoFn > - > > Key: BEAM-498 > URL: https://issues.apache.org/jira/browse/BEAM-498 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: backward-incompatible > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-507) Fill in the documentation/runners/spark portion of the website
[ https://issues.apache.org/jira/browse/BEAM-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730054#comment-15730054 ] Amit Sela commented on BEAM-507: I'll have a PR by tomorrow. > Fill in the documentation/runners/spark portion of the website > -- > > Key: BEAM-507 > URL: https://issues.apache.org/jira/browse/BEAM-507 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: Amit Sela > > As per > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. > Should be a landing page for Spark-specific information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-438) Rename one of PTransform.apply and PInput.apply
[ https://issues.apache.org/jira/browse/BEAM-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-438: Assignee: Kenneth Knowles > Rename one of PTransform.apply and PInput.apply > --- > > Key: BEAM-438 > URL: https://issues.apache.org/jira/browse/BEAM-438 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Daniel Halperin >Assignee: Kenneth Knowles > Labels: backward-incompatible > > Before releasing Beam 1.0, we should do this. > Right now, it's legal to call: > {{ptransform.apply(input)}} > and > {{input.apply(ptransform)}} > when only the latter is correct. The former skips various validation steps > and loses the notion of composite transforms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-438) Rename one of PTransform.apply and PInput.apply
[ https://issues.apache.org/jira/browse/BEAM-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729977#comment-15729977 ] ASF GitHub Bot commented on BEAM-438: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1538 [BEAM-438] Rename PTransform.apply to PTransform.expand Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Opening this to have a code pointer in dev list discussion. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam PTransform-expand Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1538.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1538 commit ff50e4d470d954c17d73358aa3e0c4c8b4123b87 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T21:33:04Z Rename PTransform.apply to PTransform.expand > Rename one of PTransform.apply and PInput.apply > --- > > Key: BEAM-438 > URL: https://issues.apache.org/jira/browse/BEAM-438 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Daniel Halperin > Labels: backward-incompatible > > Before releasing Beam 1.0, we should do this. > Right now, it's legal to call: > {{ptransform.apply(input)}} > and > {{input.apply(ptransform)}} > when only the latter is correct. The former skips various validation steps > and loses the notion of composite transforms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1090) High memory usage error
[ https://issues.apache.org/jira/browse/BEAM-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729950#comment-15729950 ] MarÃa GH commented on BEAM-1090: I had another occurrence: https://travis-ci.org/apache/incubator-beam/jobs/181976745 > High memory usage error > --- > > Key: BEAM-1090 > URL: https://issues.apache.org/jira/browse/BEAM-1090 > Project: Beam > Issue Type: Bug > Components: sdk-py >Affects Versions: 0.3.0-incubating >Reporter: MarÃa GH >Priority: Minor > > Non-reproducible high memory usage test failure. It goes away on its own. > RuntimeError: High memory usage: 201418866688 > 201008464768 [while running > 'oom:check'] > root: WARNING: A task failed with exception. > High memory usage: 201418866688 > 201008464768 [while running 'oom:check'] > --- > Complete results at https://travis-ci.org/apache/incubator-beam/jobs/181011669 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-597) Provide type information from Coders
[ https://issues.apache.org/jira/browse/BEAM-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729854#comment-15729854 ] ASF GitHub Bot commented on BEAM-597: - GitHub user jeremiele opened a pull request: https://github.com/apache/incubator-beam/pull/1537 [BEAM-597] Added a new method on Coder which returns a TypeDescriptor. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- The new method allows returning type information about the data being encoded and decoded by a Coder. Added a default implementation to StandardCoder which returns the TypeDescriptor for Object to ease the transition and avoid breaking implementations relying on StandardCoder or AtomicCoder. This will break classes implementing the Coder interface directly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jeremiele/incubator-beam add_method_to_coder_for_typedescriptor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1537.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1537 commit 4a892401689fd01a831ff8904e46f917f8f1 Author: Jeremie Lenfant-Engelmann <jeremi...@google.com> Date: 2016-12-07T19:29:29Z Added a new method on Coder which returns a TypeDescriptor. The new method allows returning type information about the data being encoded and decoded by a Coder. Added a default implementation to StandardCoder which returns the TypeDescriptor for Object to ease the transition and avoid breaking implementations relying on StandardCoder or AtomicCoder. This will break classes implementing the Coder interface directly. > Provide type information from Coders > > > Key: BEAM-597 > URL: https://issues.apache.org/jira/browse/BEAM-597 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Jeremie Lenfant-Engelmann >Assignee: Jeremie Lenfant-Engelmann >Priority: Minor > > The Coder interface should return a TypeDescriptor describing the type that > is currently encoded/decoded by the Coder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-884) Add Display Data to the Python SDK's PipelineOptions, Avro io and other transforms
[ https://issues.apache.org/jira/browse/BEAM-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729725#comment-15729725 ] Pablo Estrada commented on BEAM-884: This is done. > Add Display Data to the Python SDK's PipelineOptions, Avro io and other > transforms > -- > > Key: BEAM-884 > URL: https://issues.apache.org/jira/browse/BEAM-884 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Pablo Estrada > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1106) Remove no_pipeline_type_check flag from Python SDK
Pablo Estrada created BEAM-1106: --- Summary: Remove no_pipeline_type_check flag from Python SDK Key: BEAM-1106 URL: https://issues.apache.org/jira/browse/BEAM-1106 Project: Beam Issue Type: Bug Components: sdk-py Reporter: Pablo Estrada Assignee: Frances Perry It's already the default behavior. It should be possible to remove it without trouble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-824) Misleading error message when sdk_location is missing in python
[ https://issues.apache.org/jira/browse/BEAM-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729724#comment-15729724 ] Pablo Estrada commented on BEAM-824: This was fixed. > Misleading error message when sdk_location is missing in python > --- > > Key: BEAM-824 > URL: https://issues.apache.org/jira/browse/BEAM-824 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Pablo Estrada > > When trying to submit jobs to the Cloud Dataflow service using the Python > SDK, the sdk_location should be provided or the serive errors out saying that > package google-cloud-dataflow is missing. > We might want to prompt users to add sdk_location parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1055) Display Data keys on Python are inconsistent
[ https://issues.apache.org/jira/browse/BEAM-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729720#comment-15729720 ] ASF GitHub Bot commented on BEAM-1055: -- Github user pabloem closed the pull request at: https://github.com/apache/incubator-beam/pull/1443 > Display Data keys on Python are inconsistent > > > Key: BEAM-1055 > URL: https://issues.apache.org/jira/browse/BEAM-1055 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Pablo Estrada > > Some are in camelCase, some are in snake_case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-722) Add Display Data to the Python SDK
[ https://issues.apache.org/jira/browse/BEAM-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729723#comment-15729723 ] Pablo Estrada commented on BEAM-722: This is done. > Add Display Data to the Python SDK > -- > > Key: BEAM-722 > URL: https://issues.apache.org/jira/browse/BEAM-722 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Pablo Estrada >Assignee: Frances Perry > > The DisplayData feature has been added to the Java SDK (see blog post > announcing it: > https://cloud.google.com/blog/big-data/2016/06/dataflow-updates-see-more-details-about-your-pipelines). > We need now to add it to the Python SDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1055) Display Data keys on Python are inconsistent
[ https://issues.apache.org/jira/browse/BEAM-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729721#comment-15729721 ] Pablo Estrada commented on BEAM-1055: - This is done. > Display Data keys on Python are inconsistent > > > Key: BEAM-1055 > URL: https://issues.apache.org/jira/browse/BEAM-1055 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Pablo Estrada > > Some are in camelCase, some are in snake_case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.
[ https://issues.apache.org/jira/browse/BEAM-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729610#comment-15729610 ] Jesse Anderson commented on BEAM-1105: -- Sounds good to me. > Adding Beam's pico Wordcount to the existing examples. > --- > > Key: BEAM-1105 > URL: https://issues.apache.org/jira/browse/BEAM-1105 > Project: Beam > Issue Type: Wish >Reporter: Neelesh Srinivas Salian >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: examples > > http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/ > Is a good explanation for the WordCount that would encourage users. > Adding this to the examples and subsequently the docs is a good step to help > new users start from a good foundation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1065) FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)
[ https://issues.apache.org/jira/browse/BEAM-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729597#comment-15729597 ] ASF GitHub Bot commented on BEAM-1065: -- Github user peihe closed the pull request at: https://github.com/apache/incubator-beam/pull/1426 > FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition) > -- > > Key: BEAM-1065 > URL: https://issues.apache.org/jira/browse/BEAM-1065 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Pei He >Assignee: Pei He > > FileBasedReader should be able to open the file with the > Source.getStartOffset(), and then read forward to find the first input > element. > The benefits are: > 1. It is easier to implement a ReadableByteChannel. > 2. Dynamically splitting won't require file systems to support seeking. > 3. Doesn't need to seek to position twice, which is what current API does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-664) Port Dataflow SDK WordCount walkthrough to Beam site
[ https://issues.apache.org/jira/browse/BEAM-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729567#comment-15729567 ] ASF GitHub Bot commented on BEAM-664: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1536 [BEAM-664] Revise WindowedWordCount example to be more independent of runner and execution style Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- This removes the use of BigQuery from the WindowedWordCount example, replacing it with a somewhat hacky file-based write of the output, using the window as the idempotency key. In order to port the test and to benefit from recent improvements in `FileBasedCheckSumMatcher`, I've factored out the resiliency code from that into an internal-only minimal `ShardedFile` class with just enough API surface to write these tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam WindowedWordCount Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1536.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1536 commit bac7b192c9fefdf536e324f1fde73bac2cd903fa Author: Kenneth Knowles <k...@google.com> Date: 2016-11-04T03:44:45Z Add IntervalWindow coder to the standard registry commit aa09449379dfaedbdfb0b73bae85ffd334b838b1 Author: Kenneth Knowles <k...@google.com> Date: 2016-11-03T21:37:26Z Revise WindowedWordCount for runner and execution mode portability commit 357732efb866e4a24d76a9aaafa10e6bc964fe7e Author: Kenneth Knowles <k...@google.com> Date: 2016-11-08T06:06:00Z Check the onSuccessMatcher in DirectRunner if isBlockOnRun is set commit 76d19716829cb98c4f0d4b34244fde8866b6d6a9 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-05T22:32:12Z Factor out ShardedFile from FileChecksumMatcher > Port Dataflow SDK WordCount walkthrough to Beam site > > > Key: BEAM-664 > URL: https://issues.apache.org/jira/browse/BEAM-664 > Project: Beam > Issue Type: Task > Components: website >Reporter: Hadar Hod >Assignee: Hadar Hod > > Port the WordCount walkthrough from Dataflow docs to Beam website. > * Copy prose (translate from html to md, remove Dataflow references, etc) > * Add accurate "How to Run" instructions for each of the WC examples > * Include code snippets from real examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.
[ https://issues.apache.org/jira/browse/BEAM-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729486#comment-15729486 ] Neelesh Srinivas Salian commented on BEAM-1105: --- [~jbonofre] I agree. Makes sense to bundle them up together. I'll start a thread and we can think it over. I'm thinking a user - specific examples bundle / module would be best to gather all of them. Will see if others have ideas. > Adding Beam's pico Wordcount to the existing examples. > --- > > Key: BEAM-1105 > URL: https://issues.apache.org/jira/browse/BEAM-1105 > Project: Beam > Issue Type: Wish >Reporter: Neelesh Srinivas Salian >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: examples > > http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/ > Is a good explanation for the WordCount that would encourage users. > Adding this to the examples and subsequently the docs is a good step to help > new users start from a good foundation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.
[ https://issues.apache.org/jira/browse/BEAM-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729476#comment-15729476 ] Jean-Baptiste Onofré commented on BEAM-1105: FYI, I also have concrete samples here: https://github.com/jbonofre/beam-samples I think we should have a discussion about that we provide. Currently, the examples are also used to test the runners. IMHO, Jesse's example is a sample, end-user oriented. So, it's a good candidate to be gather with beam-samples all together. > Adding Beam's pico Wordcount to the existing examples. > --- > > Key: BEAM-1105 > URL: https://issues.apache.org/jira/browse/BEAM-1105 > Project: Beam > Issue Type: Wish >Reporter: Neelesh Srinivas Salian >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: examples > > http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/ > Is a good explanation for the WordCount that would encourage users. > Adding this to the examples and subsequently the docs is a good step to help > new users start from a good foundation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.
Neelesh Srinivas Salian created BEAM-1105: - Summary: Adding Beam's pico Wordcount to the existing examples. Key: BEAM-1105 URL: https://issues.apache.org/jira/browse/BEAM-1105 Project: Beam Issue Type: Wish Reporter: Neelesh Srinivas Salian Assignee: Neelesh Srinivas Salian Priority: Minor http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/ Is a good explanation for the WordCount that would encourage users. Adding this to the examples and subsequently the docs is a good step to help new users start from a good foundation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-646) Get runners out of the apply()
[ https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729312#comment-15729312 ] ASF GitHub Bot commented on BEAM-646: - Github user tgroh closed the pull request at: https://github.com/apache/incubator-beam/pull/1442 > Get runners out of the apply() > -- > > Key: BEAM-646 > URL: https://issues.apache.org/jira/browse/BEAM-646 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Thomas Groh > > Right now, the runner intercepts calls to apply() and replaces transforms as > we go. This means that there is no "original" user graph. For portability and > misc architectural benefits, we would like to build the original graph first, > and have the runner override later. > Some runners already work in this manner, but we could integrate it more > smoothly, with more validation, via some handy APIs on e.g. the Pipeline > object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-115) Beam Runner API
[ https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729306#comment-15729306 ] ASF GitHub Bot commented on BEAM-115: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1511 > Beam Runner API > --- > > Key: BEAM-115 > URL: https://issues.apache.org/jira/browse/BEAM-115 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > > The PipelineRunner API from the SDK is not ideal for the Beam technical > vision. > It has technical limitations: > - The user's DAG (even including library expansions) is never explicitly > represented, so it cannot be analyzed except incrementally, and cannot > necessarily be reconstructed (for example, to display it!). > - The flattened DAG of just primitive transforms isn't well-suited for > display or transform override. > - The TransformHierarchy isn't well-suited for optimizations. > - The user must realistically pre-commit to a runner, and its configuration > (batch vs streaming) prior to graph construction, since the runner will be > modifying the graph as it is built. > - It is fairly language- and SDK-specific. > It has usability issues (these are not from intuition, but derived from > actual cases of failure to use according to the design) > - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner > is confusing. > - The TransformHierarchy, accessible only via visitor traversals, is > cumbersome. > - The staging of construction-time vs run-time is not always obvious. > These are just examples. This ticket tracks designing, coming to consensus, > and building an API that more simply and directly supports the technical > vision. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn
[ https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729262#comment-15729262 ] ASF GitHub Bot commented on BEAM-498: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1529 > Make DoFnWithContext the new DoFn > - > > Key: BEAM-498 > URL: https://issues.apache.org/jira/browse/BEAM-498 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: backward-incompatible > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn
[ https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729237#comment-15729237 ] ASF GitHub Bot commented on BEAM-498: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1527 > Make DoFnWithContext the new DoFn > - > > Key: BEAM-498 > URL: https://issues.apache.org/jira/browse/BEAM-498 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: backward-incompatible > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts
[ https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729017#comment-15729017 ] Maximilian Michels commented on BEAM-1092: -- 2) is for avoiding user classes to clash with all SDK-related parts. 1) is for avoiding user classes to clash with other libraries added to the classpath which do not properly shade common libraries (e.g. Hadoop) The archetype does not become more complicated because of 1). It should be more or less be transparent to the user who uses the archetype. > Shade commonly used libraries (e.g. Guava) to avoid class conflicts > --- > > Key: BEAM-1092 > URL: https://issues.apache.org/jira/browse/BEAM-1092 > Project: Beam > Issue Type: Bug > Components: examples-java, sdk-java-extensions >Affects Versions: 0.3.0-incubating >Reporter: Maximilian Michels >Assignee: Frances Perry > Fix For: 0.4.0-incubating > > > Beam shades away some of its dependencies like Guava to avoid user classes > from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, > do not shade any classes and directly depend on potentially conflicting > libraries (e.g. Guava). Also, users might manually add such libraries as > dependencies. > Runners who add classes to the classpath (e.g. Hadoop) can run into conflict > with multiple versions of the same class. To prevent that, we should adjust > the Maven archetypes pom files used for the Quickstart to perform shading of > commonly used libraries (again, Guava is often the culprit). > To prevent the problem in the first place, we should expand the shading of > Guava and other libraries to all modules which make use of these. > To solve both dimensions of the issue, we need to address: > 1. Adding shading of commonly used libraries to the archetypes poms > 2. Properly shade all commonly used libraries in the SDK modules > 2) seems to be of highest priority since it affects users who simply use the > provided IO modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1104) WordCount: Metrics error in the DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Halperin updated BEAM-1104: -- Assignee: Thomas Groh (was: Davor Bonaci) > WordCount: Metrics error in the DirectRunner > > > Key: BEAM-1104 > URL: https://issues.apache.org/jira/browse/BEAM-1104 > Project: Beam > Issue Type: Bug > Components: runner-direct >Reporter: Daniel Halperin >Assignee: Thomas Groh > > I'm following the Beam quickstart to analyze the pom.xml for the examples > archetype in the DirectRunner: > Generate the project: > {code} > mvn archetype:generate \ > > -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots > \ > -DarchetypeGroupId=org.apache.beam \ > -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ > -DarchetypeVersion=LATEST \ > -DgroupId=org.example \ > -DartifactId=word-count-beam \ > -Dversion="0.1" \ > -Dpackage=org.apache.beam.examples \ > -DinteractiveMode=false > {code} > Count words in the pom.xml: > {code} > mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ > -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner > {code} > The logs: > {code} > INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam --- > Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource > expandFilePattern > INFO: Matched 1 files for pattern pom.xml > Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment > getCurrentContainer > SEVERE: Unable to update metrics on the current thread. Most likely caused by > using metrics outside the managed work-execution thread. > Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement > INFO: Initializing write operation > org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf > Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles > processElement > INFO: Opening writer for write operation > org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 > Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles > processElement > INFO: Opening writer for write operation > org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 > Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles > processElement > INFO: Opening writer for write operation > org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 > Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles > processElement > INFO: Opening writer for write operation > org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 > Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement > INFO: Finalizing write operation > org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a. > {code} > Presumably, this {{SEVERE}} warning is indicative of a bug (or should be > masked). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1104) WordCount: Metrics error in the DirectRunner
Daniel Halperin created BEAM-1104: - Summary: WordCount: Metrics error in the DirectRunner Key: BEAM-1104 URL: https://issues.apache.org/jira/browse/BEAM-1104 Project: Beam Issue Type: Bug Components: runner-direct Reporter: Daniel Halperin Assignee: Davor Bonaci I'm following the Beam quickstart to analyze the pom.xml for the examples archetype in the DirectRunner: Generate the project: {code} mvn archetype:generate \ -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots \ -DarchetypeGroupId=org.apache.beam \ -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ -DarchetypeVersion=LATEST \ -DgroupId=org.example \ -DartifactId=word-count-beam \ -Dversion="0.1" \ -Dpackage=org.apache.beam.examples \ -DinteractiveMode=false {code} Count words in the pom.xml: {code} mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner {code} The logs: {code} INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam --- Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource expandFilePattern INFO: Matched 1 files for pattern pom.xml Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment getCurrentContainer SEVERE: Unable to update metrics on the current thread. Most likely caused by using metrics outside the managed work-execution thread. Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement INFO: Initializing write operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles processElement INFO: Opening writer for write operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles processElement INFO: Opening writer for write operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles processElement INFO: Opening writer for write operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles processElement INFO: Opening writer for write operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061 Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement INFO: Finalizing write operation org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a. {code} Presumably, this {{SEVERE}} warning is indicative of a bug (or should be masked). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-329) Update Spark runner README.
[ https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Baptiste Onofré resolved BEAM-329. --- Resolution: Fixed Fix Version/s: 0.4.0-incubating > Update Spark runner README. > --- > > Key: BEAM-329 > URL: https://issues.apache.org/jira/browse/BEAM-329 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 0.2.0-incubating >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.4.0-incubating > > > The Spark runner should have a proper WordCount (or something more fancy) > example in the README. This is a bit problematic as Beam is lacking HDFS > support in TextIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-329) Update Spark runner README.
[ https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728575#comment-15728575 ] ASF GitHub Bot commented on BEAM-329: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1532 > Update Spark runner README. > --- > > Key: BEAM-329 > URL: https://issues.apache.org/jira/browse/BEAM-329 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 0.2.0-incubating >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.4.0-incubating > > > The Spark runner should have a proper WordCount (or something more fancy) > example in the README. This is a bit problematic as Beam is lacking HDFS > support in TextIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-329) Update Spark runner README.
[ https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Sela updated BEAM-329: --- Summary: Update Spark runner README. (was: Spark runner README should have a proper batch example.) > Update Spark runner README. > --- > > Key: BEAM-329 > URL: https://issues.apache.org/jira/browse/BEAM-329 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 0.2.0-incubating >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > > The Spark runner should have a proper WordCount (or something more fancy) > example in the README. This is a bit problematic as Beam is lacking HDFS > support in TextIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-329) Spark runner README should have a proper batch example.
[ https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728336#comment-15728336 ] ASF GitHub Bot commented on BEAM-329: - GitHub user amitsela opened a pull request: https://github.com/apache/incubator-beam/pull/1532 [BEAM-329] Spark runner README should have a proper batch example. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/amitsela/incubator-beam BEAM-329 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1532.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1532 commit c50fe89a071f59740b6f5bd90e1984ca3159162f Author: Sela <ans...@paypal.com> Date: 2016-12-07T09:20:07Z [BEAM-329] Spark runner README should have a proper batch example. > Spark runner README should have a proper batch example. > --- > > Key: BEAM-329 > URL: https://issues.apache.org/jira/browse/BEAM-329 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 0.2.0-incubating >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > > The Spark runner should have a proper WordCount (or something more fancy) > example in the README. This is a bit problematic as Beam is lacking HDFS > support in TextIO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-1094) Spark runner should define Kafka IO dependency with test scope
[ https://issues.apache.org/jira/browse/BEAM-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Sela resolved BEAM-1094. - Resolution: Fixed Fix Version/s: 0.4.0-incubating > Spark runner should define Kafka IO dependency with test scope > -- > > Key: BEAM-1094 > URL: https://issues.apache.org/jira/browse/BEAM-1094 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > Fix For: 0.4.0-incubating > > > Spark runner uses Kafka IO for testing purpose. However, the Kafka IO > dependency is with compile scope whereas it should be test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1094) Spark runner should define Kafka IO dependency with test scope
[ https://issues.apache.org/jira/browse/BEAM-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728169#comment-15728169 ] ASF GitHub Bot commented on BEAM-1094: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1531 > Spark runner should define Kafka IO dependency with test scope > -- > > Key: BEAM-1094 > URL: https://issues.apache.org/jira/browse/BEAM-1094 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Spark runner uses Kafka IO for testing purpose. However, the Kafka IO > dependency is with compile scope whereas it should be test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1094) Spark runner should define Kafka IO dependency with test scope
[ https://issues.apache.org/jira/browse/BEAM-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728014#comment-15728014 ] ASF GitHub Bot commented on BEAM-1094: -- GitHub user jbonofre opened a pull request: https://github.com/apache/incubator-beam/pull/1531 [BEAM-1094] Set test scope for Kafka IO and junit Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [X] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [X] Replace `` in the title with the actual Jira issue number, if there is one. - [X] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/jbonofre/incubator-beam BEAM-1094 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1531.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1531 commit 346c0b528297ab39bfa021ee052dcee48f56953d Author: Jean-Baptiste Onofré <jbono...@apache.org> Date: 2016-12-07T07:37:33Z [BEAM-1094] Set test scope for Kafka IO and junit > Spark runner should define Kafka IO dependency with test scope > -- > > Key: BEAM-1094 > URL: https://issues.apache.org/jira/browse/BEAM-1094 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Spark runner uses Kafka IO for testing purpose. However, the Kafka IO > dependency is with compile scope whereas it should be test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-1102) Flink Batch Runner does not populate aggregator values
[ https://issues.apache.org/jira/browse/BEAM-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed BEAM-1102. -- Resolution: Fixed Fix Version/s: 0.4.0-incubating > Flink Batch Runner does not populate aggregator values > -- > > Key: BEAM-1102 > URL: https://issues.apache.org/jira/browse/BEAM-1102 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 0.3.0-incubating >Reporter: Daniel Halperin >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.4.0-incubating > > > Running the quickstart gives 0 for emptyLines. > Running with {{--streaming=true}} gives the correct value (for my input file, > the default examples archetype {{pom.xml}}, the true value is 27 at the time > of writing). > Streaming output: > {code} > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: Final aggregator values: > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: DroppedDueToLateness : 0 > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: emptyLines : 27 > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: DroppedDueToClosedWindow : 0 > {code} > Non-streaming output: > {code} > Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: Final aggregator values: > Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: emptyLines : 0 > {code} > (Note also that the lateness etc. aggregators are missing entirely, may be > expected). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1103) Add Tests For Aggregators in Flink Runner
Aljoscha Krettek created BEAM-1103: -- Summary: Add Tests For Aggregators in Flink Runner Key: BEAM-1103 URL: https://issues.apache.org/jira/browse/BEAM-1103 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Aljoscha Krettek We currently don't have tests that verify that aggregator values are correctly forwarded to Flink. They didn't work correctly in the Batch Flink runner, as seen in BEAM-1102. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1102) Flink Batch Runner does not populate aggregator values
[ https://issues.apache.org/jira/browse/BEAM-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727742#comment-15727742 ] Aljoscha Krettek commented on BEAM-1102: The problem is this part in {{FlinkProcessContextBase}}: {code} @Override protected <AggInputT, AggOutputT> Aggregator<AggInputT, AggOutputT> createAggregatorInternal(String name, Combine.CombineFn<AggInputT, ?, AggOutputT> combiner) { SerializableFnAggregatorWrapper<AggInputT, AggOutputT> wrapper = new SerializableFnAggregatorWrapper<>(combiner); Accumulator existingAccum = (Accumulator<AggInputT, Serializable>) runtimeContext.getAccumulator(name); if (existingAccum != null) { return wrapper; } else { runtimeContext.addAccumulator(name, wrapper); } return wrapper; } {code} Notice how the newly created wrapper is returned if the accumulator already exists. > Flink Batch Runner does not populate aggregator values > -- > > Key: BEAM-1102 > URL: https://issues.apache.org/jira/browse/BEAM-1102 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 0.3.0-incubating >Reporter: Daniel Halperin >Assignee: Aljoscha Krettek >Priority: Minor > > Running the quickstart gives 0 for emptyLines. > Running with {{--streaming=true}} gives the correct value (for my input file, > the default examples archetype {{pom.xml}}, the true value is 27 at the time > of writing). > Streaming output: > {code} > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: Final aggregator values: > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: DroppedDueToLateness : 0 > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: emptyLines : 27 > Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: DroppedDueToClosedWindow : 0 > {code} > Non-streaming output: > {code} > Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: Final aggregator values: > Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run > INFO: emptyLines : 0 > {code} > (Note also that the lateness etc. aggregators are missing entirely, may be > expected). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-812) Shade guava in beam-sdks-java-io-google-cloud-platform
[ https://issues.apache.org/jira/browse/BEAM-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727720#comment-15727720 ] Daniel Halperin commented on BEAM-812: -- This is not likely to be fixed in the 0.4.0 release because Bigtable has not yet pushed a release of their jar with the API surface cleanup integrated. > Shade guava in beam-sdks-java-io-google-cloud-platform > -- > > Key: BEAM-812 > URL: https://issues.apache.org/jira/browse/BEAM-812 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Daniel Halperin >Assignee: Daniel Halperin > > Looking at 0.3.0-incubating RC1, we are not properly shading Guava. > https://repository.apache.org/content/repositories/staging/org/apache/beam/beam-sdks-java-io-google-cloud-platform/0.3.0-incubating/beam-sdks-java-io-google-cloud-platform-0.3.0-incubating.pom > has > {code} > > com.google.guava > guava > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-812) Shade guava in beam-sdks-java-io-google-cloud-platform
[ https://issues.apache.org/jira/browse/BEAM-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Halperin updated BEAM-812: - Fix Version/s: (was: 0.4.0-incubating) > Shade guava in beam-sdks-java-io-google-cloud-platform > -- > > Key: BEAM-812 > URL: https://issues.apache.org/jira/browse/BEAM-812 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Daniel Halperin >Assignee: Daniel Halperin > > Looking at 0.3.0-incubating RC1, we are not properly shading Guava. > https://repository.apache.org/content/repositories/staging/org/apache/beam/beam-sdks-java-io-google-cloud-platform/0.3.0-incubating/beam-sdks-java-io-google-cloud-platform-0.3.0-incubating.pom > has > {code} > > com.google.guava > guava > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers
[ https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727629#comment-15727629 ] ASF GitHub Bot commented on BEAM-27: GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1528 [BEAM-27] Add DoFn.OnTimerContext and support as a DoFn parameter Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam OnTimerContext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1528.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1528 commit 44a7b915d502c318ffabaa6fb808207bb3ea15e8 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T04:10:06Z Add DoFn.OnTimerContext commit 1934b704411fed76a58cbc657c7dc8be666c3885 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T04:01:18Z Add support for OnTimerContext parameter in DoFnSignature commit 14d9a3794c4f8c68626075343a1dec1d4c017686 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-07T04:10:21Z Access to OnTimerContext via DoFnInvokers.ArgumentProvider > Add user-ready API for interacting with timers > -- > > Key: BEAM-27 > URL: https://issues.apache.org/jira/browse/BEAM-27 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > > Pipeline authors will benefit from a different factorization of interaction > with underlying timers. The current APIs are targeted at runner implementers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn
[ https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727614#comment-15727614 ] ASF GitHub Bot commented on BEAM-498: - GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1527 [BEAM-498] Port most of DoFnRunner Javadoc to new DoFn Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- R: @tgroh You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam DoFnRunner-javadoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1527.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1527 commit d26d9bbce073715d3c6b644b286574ce8c782597 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-06T23:20:28Z Port most of DoFnRunner Javadoc to new DoFn > Make DoFnWithContext the new DoFn > - > > Key: BEAM-498 > URL: https://issues.apache.org/jira/browse/BEAM-498 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > Labels: backward-incompatible > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts
[ https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated BEAM-1092: --- Comment: was deleted (was: Is 1) necessary if we do 2). I think shading is very necessary (unfortunately) so I like this issue but I would also like to keep the example/archetype POMs as simple as possible.) > Shade commonly used libraries (e.g. Guava) to avoid class conflicts > --- > > Key: BEAM-1092 > URL: https://issues.apache.org/jira/browse/BEAM-1092 > Project: Beam > Issue Type: Bug > Components: examples-java, sdk-java-extensions >Affects Versions: 0.3.0-incubating >Reporter: Maximilian Michels >Assignee: Frances Perry > Fix For: 0.4.0-incubating > > > Beam shades away some of its dependencies like Guava to avoid user classes > from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, > do not shade any classes and directly depend on potentially conflicting > libraries (e.g. Guava). Also, users might manually add such libraries as > dependencies. > Runners who add classes to the classpath (e.g. Hadoop) can run into conflict > with multiple versions of the same class. To prevent that, we should adjust > the Maven archetypes pom files used for the Quickstart to perform shading of > commonly used libraries (again, Guava is often the culprit). > To prevent the problem in the first place, we should expand the shading of > Guava and other libraries to all modules which make use of these. > To solve both dimensions of the issue, we need to address: > 1. Adding shading of commonly used libraries to the archetypes poms > 2. Properly shade all commonly used libraries in the SDK modules > 2) seems to be of highest priority since it affects users who simply use the > provided IO modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts
[ https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727530#comment-15727530 ] Aljoscha Krettek commented on BEAM-1092: Is 1) necessary if we do 2)? I think shading is very necessary (unfortunately) so I like this issue but I would also like to keep the example/archetype POMs as simple as possible. > Shade commonly used libraries (e.g. Guava) to avoid class conflicts > --- > > Key: BEAM-1092 > URL: https://issues.apache.org/jira/browse/BEAM-1092 > Project: Beam > Issue Type: Bug > Components: examples-java, sdk-java-extensions >Affects Versions: 0.3.0-incubating >Reporter: Maximilian Michels >Assignee: Frances Perry > Fix For: 0.4.0-incubating > > > Beam shades away some of its dependencies like Guava to avoid user classes > from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, > do not shade any classes and directly depend on potentially conflicting > libraries (e.g. Guava). Also, users might manually add such libraries as > dependencies. > Runners who add classes to the classpath (e.g. Hadoop) can run into conflict > with multiple versions of the same class. To prevent that, we should adjust > the Maven archetypes pom files used for the Quickstart to perform shading of > commonly used libraries (again, Guava is often the culprit). > To prevent the problem in the first place, we should expand the shading of > Guava and other libraries to all modules which make use of these. > To solve both dimensions of the issue, we need to address: > 1. Adding shading of commonly used libraries to the archetypes poms > 2. Properly shade all commonly used libraries in the SDK modules > 2) seems to be of highest priority since it affects users who simply use the > provided IO modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts
[ https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727529#comment-15727529 ] Aljoscha Krettek commented on BEAM-1092: Is 1) necessary if we do 2). I think shading is very necessary (unfortunately) so I like this issue but I would also like to keep the example/archetype POMs as simple as possible. > Shade commonly used libraries (e.g. Guava) to avoid class conflicts > --- > > Key: BEAM-1092 > URL: https://issues.apache.org/jira/browse/BEAM-1092 > Project: Beam > Issue Type: Bug > Components: examples-java, sdk-java-extensions >Affects Versions: 0.3.0-incubating >Reporter: Maximilian Michels >Assignee: Frances Perry > Fix For: 0.4.0-incubating > > > Beam shades away some of its dependencies like Guava to avoid user classes > from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, > do not shade any classes and directly depend on potentially conflicting > libraries (e.g. Guava). Also, users might manually add such libraries as > dependencies. > Runners who add classes to the classpath (e.g. Hadoop) can run into conflict > with multiple versions of the same class. To prevent that, we should adjust > the Maven archetypes pom files used for the Quickstart to perform shading of > commonly used libraries (again, Guava is often the culprit). > To prevent the problem in the first place, we should expand the shading of > Guava and other libraries to all modules which make use of these. > To solve both dimensions of the issue, we need to address: > 1. Adding shading of commonly used libraries to the archetypes poms > 2. Properly shade all commonly used libraries in the SDK modules > 2) seems to be of highest priority since it affects users who simply use the > provided IO modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1101) Remove inconsistencies in Python PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727416#comment-15727416 ] ASF GitHub Bot commented on BEAM-1101: -- GitHub user pabloem opened a pull request: https://github.com/apache/incubator-beam/pull/1526 [BEAM-1101] Remove inconsistencies in Python PipelineOptions Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/pabloem/incubator-beam poptions-inconsistencies Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1526.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1526 commit c6362aed50eae9cf8b9e5ec6d0885132f3278e32 Author: Pablo <pabl...@google.com> Date: 2016-12-07T02:01:54Z Fixing inconsistencies in PipelineOptions > Remove inconsistencies in Python PipelineOptions > > > Key: BEAM-1101 > URL: https://issues.apache.org/jira/browse/BEAM-1101 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Pablo Estrada >Assignee: Frances Perry > > Some options have been removed from Java, and some have different default > behavior in Java. Gotta make this consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1101) Remove inconsistencies in Python PipelineOptions
Pablo Estrada created BEAM-1101: --- Summary: Remove inconsistencies in Python PipelineOptions Key: BEAM-1101 URL: https://issues.apache.org/jira/browse/BEAM-1101 Project: Beam Issue Type: Bug Components: sdk-py Reporter: Pablo Estrada Assignee: Frances Perry Some options have been removed from Java, and some have different default behavior in Java. Gotta make this consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1085) Crunch Runner for Beam
[ https://issues.apache.org/jira/browse/BEAM-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neelesh Srinivas Salian updated BEAM-1085: -- Component/s: runner-ideas > Crunch Runner for Beam > -- > > Key: BEAM-1085 > URL: https://issues.apache.org/jira/browse/BEAM-1085 > Project: Beam > Issue Type: Wish > Components: runner-ideas >Reporter: Neelesh Srinivas Salian > > It came up during the BoF Beam talk earlier last month; opening this JIRA as > a placeholder for if there is interest/ desire to add this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727280#comment-15727280 ] Davor Bonaci commented on BEAM-1099: Thanks [~jghoman], much appreciated. > Minor typos in KafkaIO > -- > > Key: BEAM-1099 > URL: https://issues.apache.org/jira/browse/BEAM-1099 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jakob Homan >Assignee: Davor Bonaci >Priority: Trivial > Fix For: 0.4.0-incubating > > > While familiarizing myself with the KafkaIO support, I found and fixed a few > typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727277#comment-15727277 ] ASF GitHub Bot commented on BEAM-1099: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1524 > Minor typos in KafkaIO > -- > > Key: BEAM-1099 > URL: https://issues.apache.org/jira/browse/BEAM-1099 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jakob Homan >Assignee: Davor Bonaci >Priority: Trivial > Fix For: 0.4.0-incubating > > > While familiarizing myself with the KafkaIO support, I found and fixed a few > typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-1099) Minor typos in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davor Bonaci resolved BEAM-1099. Resolution: Fixed Fix Version/s: 0.4.0-incubating > Minor typos in KafkaIO > -- > > Key: BEAM-1099 > URL: https://issues.apache.org/jira/browse/BEAM-1099 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jakob Homan >Assignee: Davor Bonaci >Priority: Trivial > Fix For: 0.4.0-incubating > > > While familiarizing myself with the KafkaIO support, I found and fixed a few > typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727272#comment-15727272 ] Jakob Homan commented on BEAM-1099: --- https://github.com/apache/incubator-beam/pull/1524 > Minor typos in KafkaIO > -- > > Key: BEAM-1099 > URL: https://issues.apache.org/jira/browse/BEAM-1099 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jakob Homan >Assignee: Davor Bonaci >Priority: Trivial > > While familiarizing myself with the KafkaIO support, I found and fixed a few > typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727271#comment-15727271 ] ASF GitHub Bot commented on BEAM-1099: -- GitHub user jghoman opened a pull request: https://github.com/apache/incubator-beam/pull/1524 [BEAM-1099] Minor typos in KafkaIO Fix various distracting typos in KafkaIO You can merge this pull request into a Git repository by running: $ git pull https://github.com/jghoman/incubator-beam BEAM-1099 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1524.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1524 commit a71e9e4b99c14fbe453fa3dca486fa268c9782f5 Author: Jakob Homan <jgho...@gmail.com> Date: 2016-12-07T00:59:50Z [BEAM-1099] Minor typos in KafkaIO > Minor typos in KafkaIO > -- > > Key: BEAM-1099 > URL: https://issues.apache.org/jira/browse/BEAM-1099 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jakob Homan >Assignee: Davor Bonaci >Priority: Trivial > > While familiarizing myself with the KafkaIO support, I found and fixed a few > typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727268#comment-15727268 ] ASF GitHub Bot commented on BEAM-551: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1504 > Support Dynamic PipelineOptions > --- > > Key: BEAM-551 > URL: https://issues.apache.org/jira/browse/BEAM-551 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Sam McVeety >Assignee: Sam McVeety >Priority: Minor > > During the graph construction phase, the given SDK generates an initial > execution graph for the program. At execution time, this graph is > executed, either locally or by a service. Currently, Beam only supports > parameterization at graph construction time. Both Flink and Spark supply > functionality that allows a pre-compiled job to be run without SDK > interaction with updated runtime parameters. > In its current incarnation, Dataflow can read values of PipelineOptions at > job submission time, but this requires the presence of an SDK to properly > encode these values into the job. We would like to build a common layer > into the Beam model so that these dynamic options can be properly provided > to jobs. > Please see > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit > for the high-level model, and > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit > for > the specific API proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-1098) Minor typos in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davor Bonaci closed BEAM-1098. -- Resolution: Duplicate Fix Version/s: Not applicable > Minor typos in KafkaIO > -- > > Key: BEAM-1098 > URL: https://issues.apache.org/jira/browse/BEAM-1098 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jakob Homan >Assignee: Davor Bonaci >Priority: Trivial > Fix For: Not applicable > > > While familiarizing myself with the KafkaIO support, I found and fixed a few > typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1099) Minor typos in KafkaIO
Jakob Homan created BEAM-1099: - Summary: Minor typos in KafkaIO Key: BEAM-1099 URL: https://issues.apache.org/jira/browse/BEAM-1099 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Jakob Homan Assignee: Davor Bonaci Priority: Trivial While familiarizing myself with the KafkaIO support, I found and fixed a few typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1098) Minor typos in KafkaIO
Jakob Homan created BEAM-1098: - Summary: Minor typos in KafkaIO Key: BEAM-1098 URL: https://issues.apache.org/jira/browse/BEAM-1098 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Jakob Homan Assignee: Davor Bonaci Priority: Trivial While familiarizing myself with the KafkaIO support, I found and fixed a few typos in the comments for that class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-9) Storm Runner
[ https://issues.apache.org/jira/browse/BEAM-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727260#comment-15727260 ] Davor Bonaci commented on BEAM-9: - Having a Storm runner would be great! Awesome! At some point (in the future), I think it is worth clarifying the pros and cons of where the runner lives. On the technical side, I think it comes down to the API stability between different pairs of components (but, of course, there are other considerations as well). > Storm Runner > > > Key: BEAM-9 > URL: https://issues.apache.org/jira/browse/BEAM-9 > Project: Beam > Issue Type: Wish > Components: runner-ideas >Reporter: Frances Perry >Assignee: Sriharsha Chintalapani > > Gathering place for interest in a Storm runner for Beam. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727142#comment-15727142 ] ASF GitHub Bot commented on BEAM-551: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1503 > Support Dynamic PipelineOptions > --- > > Key: BEAM-551 > URL: https://issues.apache.org/jira/browse/BEAM-551 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Sam McVeety >Assignee: Sam McVeety >Priority: Minor > > During the graph construction phase, the given SDK generates an initial > execution graph for the program. At execution time, this graph is > executed, either locally or by a service. Currently, Beam only supports > parameterization at graph construction time. Both Flink and Spark supply > functionality that allows a pre-compiled job to be run without SDK > interaction with updated runtime parameters. > In its current incarnation, Dataflow can read values of PipelineOptions at > job submission time, but this requires the presence of an SDK to properly > encode these values into the job. We would like to build a common layer > into the Beam model so that these dynamic options can be properly provided > to jobs. > Please see > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit > for the high-level model, and > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit > for > the specific API proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1038) Support for new State API in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727019#comment-15727019 ] ASF GitHub Bot commented on BEAM-1038: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1523 > Support for new State API in DataflowRunner > --- > > Key: BEAM-1038 > URL: https://issues.apache.org/jira/browse/BEAM-1038 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1038) Support for new State API in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726929#comment-15726929 ] ASF GitHub Bot commented on BEAM-1038: -- GitHub user kennknowles opened a pull request: https://github.com/apache/incubator-beam/pull/1523 [BEAM-1038] Allow stateful DoFn in DataflowRunner Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [x] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [x] Replace `` in the title with the actual Jira issue number, if there is one. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- Confirmed that the post commit should succeed, via the following command: ``` mvn verify \ --batch-mode \ --errors \ --projects runners/google-cloud-dataflow-java \ -DforkCount=0 \ -DfailIfNoTests=false \ -Dtest=org.apache.beam.sdk.transforms.ParDoTest \ -DrunnableOnServicePipelineOptions='[ "--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner", "--project=...", "--tempRoot=..." ]' ``` with output ending in: ``` Running org.apache.beam.sdk.transforms.ParDoTest Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 273.736 sec - in org.apache.beam.sdk.transforms.ParDoTest ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/kennknowles/incubator-beam DataflowRunner-state Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1523.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1523 commit 9e4d40731ff53b2cd0f2a57d18c6f560db441483 Author: Kenneth Knowles <k...@google.com> Date: 2016-12-06T21:51:19Z Allow stateful DoFn in DataflowRunner > Support for new State API in DataflowRunner > --- > > Key: BEAM-1038 > URL: https://issues.apache.org/jira/browse/BEAM-1038 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1097) Dataflow error message for non-existing gcpTempLocation is misleading
[ https://issues.apache.org/jira/browse/BEAM-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726917#comment-15726917 ] ASF GitHub Bot commented on BEAM-1097: -- GitHub user swegner opened a pull request: https://github.com/apache/incubator-beam/pull/1522 [BEAM-1097] Provide a better error message for non-existing gcpTempLocation Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- gcpTempLocation will default to using the value for tmpLocation, as long as the value is a valid GCP path. Non-valid GCP paths are silently discarded. This change removes existence validation from the default value logic such that downstream validation can provide a better error message. You can merge this pull request into a Git repository by running: $ git pull https://github.com/swegner/incubator-beam gcp-temp-location-error Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1522.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1522 commit 9d768df4323a246baa705fd5fb75d08c78abb7f0 Author: Scott Wegner <sweg...@google.com> Date: 2016-12-06T22:19:12Z Provide a better error message for non-existing gcpTempLocation gcpTempLocation will default to using the value for tmpLocation, as long as the value is a valid GCP path. Non-valid GCP paths are silently discarded. This change removes existence validation from the default value logic such that downstream validation can provide a better error message. > Dataflow error message for non-existing gcpTempLocation is misleading > - > > Key: BEAM-1097 > URL: https://issues.apache.org/jira/browse/BEAM-1097 > Project: Beam > Issue Type: Bug > Components: examples-java, runner-dataflow >Reporter: Scott Wegner >Assignee: Scott Wegner >Priority: Minor > > The error message for specifying a GCP tempLocation which doesn't exist is > misleading. Rather than mentioning the given path doesn't exist, it says none > ways specified. > This is particularly frustrating because it's one of the few configuration > values necessary to get started with an example or starter archetype, and > it's easy to introduce a typo as it's specified on the commandline. In my > case, I was specifying "gs://swegner-tmp" instead of "gs://swegner-test". > Repro: > 1. Clone the starter archetype: {noformat}mvn archetype:generate > -DarchetypeGroupId=org.apache.beam > -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter{noformat} > 2. Add beam-runners-google-cloud-dataflow-java as a dependency in the > generated pom.xml > 3. Build: {noformat}mvn install{noformat} > 4. Run: {noformat}mvn exec:java -DmainClass=swegner.StarterPipeline > -Dexec.args="--runner=DataflowRunner > --tempLocation=gs://swegner-tmp"{noformat} > Expected: An error message along the lines of: "The specified GCP temp > location 'gs://swegner-tmp' does not exist under project 'myGcpProject'" > bq. [ERROR] Failed to execute goal > org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project > counter-names-test: An exception occured while executing the Java class. > null: InvocationTargetException: Failed to construct instance from factory > method DataflowRunner#fromOptions(interface > org.apache.beam.sdk.options.PipelineOptions): DataflowRunner requires > gcpTempLocation, and it is missing in PipelineOptions. -> [Help 1] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1097) Dataflow error message for non-existing gcpTempLocation is misleading
Scott Wegner created BEAM-1097: -- Summary: Dataflow error message for non-existing gcpTempLocation is misleading Key: BEAM-1097 URL: https://issues.apache.org/jira/browse/BEAM-1097 Project: Beam Issue Type: Bug Components: examples-java, runner-dataflow Reporter: Scott Wegner Assignee: Scott Wegner Priority: Minor The error message for specifying a GCP tempLocation which doesn't exist is misleading. Rather than mentioning the given path doesn't exist, it says none ways specified. This is particularly frustrating because it's one of the few configuration values necessary to get started with an example or starter archetype, and it's easy to introduce a typo as it's specified on the commandline. In my case, I was specifying "gs://swegner-tmp" instead of "gs://swegner-test". Repro: 1. Clone the starter archetype: {noformat}mvn archetype:generate -DarchetypeGroupId=org.apache.beam -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter{noformat} 2. Add beam-runners-google-cloud-dataflow-java as a dependency in the generated pom.xml 3. Build: {noformat}mvn install{noformat} 4. Run: {noformat}mvn exec:java -DmainClass=swegner.StarterPipeline -Dexec.args="--runner=DataflowRunner --tempLocation=gs://swegner-tmp"{noformat} Expected: An error message along the lines of: "The specified GCP temp location 'gs://swegner-tmp' does not exist under project 'myGcpProject'" bq. [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project counter-names-test: An exception occured while executing the Java class. null: InvocationTargetException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions): DataflowRunner requires gcpTempLocation, and it is missing in PipelineOptions. -> [Help 1] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-986) ReduceFnRunner doesn't batch prefetching pane firings
[ https://issues.apache.org/jira/browse/BEAM-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726841#comment-15726841 ] ASF GitHub Bot commented on BEAM-986: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1519 > ReduceFnRunner doesn't batch prefetching pane firings > - > > Key: BEAM-986 > URL: https://issues.apache.org/jira/browse/BEAM-986 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Sam Whittle >Assignee: Sam Whittle > Original Estimate: 24h > Remaining Estimate: 24h > > Specifically > - in ProcessElements, if there are multiple windows to consider each is > processed sequentially with sequential state fetches instead of a bulk > prefetch > - onTimer method doesn't evaluate multiple timers at a time meaning that if > multiple timers are fired at once each is processed sequentially without > batched prefetching -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726722#comment-15726722 ] ASF GitHub Bot commented on BEAM-551: - Github user sammcveety closed the pull request at: https://github.com/apache/incubator-beam/pull/1238 > Support Dynamic PipelineOptions > --- > > Key: BEAM-551 > URL: https://issues.apache.org/jira/browse/BEAM-551 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Sam McVeety >Assignee: Sam McVeety >Priority: Minor > > During the graph construction phase, the given SDK generates an initial > execution graph for the program. At execution time, this graph is > executed, either locally or by a service. Currently, Beam only supports > parameterization at graph construction time. Both Flink and Spark supply > functionality that allows a pre-compiled job to be run without SDK > interaction with updated runtime parameters. > In its current incarnation, Dataflow can read values of PipelineOptions at > job submission time, but this requires the presence of an SDK to properly > encode these values into the job. We would like to build a common layer > into the Beam model so that these dynamic options can be properly provided > to jobs. > Please see > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit > for the high-level model, and > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit > for > the specific API proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726721#comment-15726721 ] ASF GitHub Bot commented on BEAM-551: - GitHub user sammcveety reopened a pull request: https://github.com/apache/incubator-beam/pull/1238 [BEAM-551] BigqueryIO.Read support for ValueProvider R: @dhalperi This is the serialization issue I was referencing, where I think the issue is that there is a deferred translation from String -> TableRef -> String, causing a serialization error. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sammcveety/incubator-beam sgmc/bq_vp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1238 commit ced1073597e3fbcc826f10a417e810803c7317ce Author: Sam McVeety <s...@google.com> Date: 2016-10-30T18:58:44Z Add BQ VP code BQ VP Fix most tests Make serializable Fix BQ tests Fix tests Update API Fix validation case Fix one more query reference commit 2d05f107499ce94bb35f05a4c386625e66fe660c Author: Sam McVeety <s...@google.com> Date: 2016-12-06T05:24:20Z Minor fixes commit 1ca49af0659fd32ffa06e4e5778e3e4f90d85be6 Author: Sam McVeety <s...@google.com> Date: 2016-12-06T05:31:34Z Address serialization issue > Support Dynamic PipelineOptions > --- > > Key: BEAM-551 > URL: https://issues.apache.org/jira/browse/BEAM-551 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Sam McVeety >Assignee: Sam McVeety >Priority: Minor > > During the graph construction phase, the given SDK generates an initial > execution graph for the program. At execution time, this graph is > executed, either locally or by a service. Currently, Beam only supports > parameterization at graph construction time. Both Flink and Spark supply > functionality that allows a pre-compiled job to be run without SDK > interaction with updated runtime parameters. > In its current incarnation, Dataflow can read values of PipelineOptions at > job submission time, but this requires the presence of an SDK to properly > encode these values into the job. We would like to build a common layer > into the Beam model so that these dynamic options can be properly provided > to jobs. > Please see > https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit > for the high-level model, and > https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit > for > the specific API proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-641) Need to test the generated archetypes projects
[ https://issues.apache.org/jira/browse/BEAM-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davor Bonaci resolved BEAM-641. --- Resolution: Invalid Fix Version/s: (was: 0.4.0-incubating) Not applicable > Need to test the generated archetypes projects > -- > > Key: BEAM-641 > URL: https://issues.apache.org/jira/browse/BEAM-641 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Pei He >Assignee: Pei He > Fix For: Not applicable > > > Travis and Jenkins pre-submits don't test building the generated archetypes > projects. > Currently, changes to archetypes have to be manually verified by: > mvn archetype:generate \ > -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ > -DarchetypeGroupId=org.apache.beam \ > -DarchetypeVersion=0.3.0-incubating-SNAPSHOT \ > -DgroupId=com.example \ > -DartifactId=first-beam \ > -Dversion="0.3.0-incubating-SNAPSHOT" \ > -DinteractiveMode=false \ > -Dpackage=org.apache.beam.examples > and did "mvn clean install" in first-beam project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-507) Fill in the documentation/runners/spark portion of the website
[ https://issues.apache.org/jira/browse/BEAM-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Sela reassigned BEAM-507: -- Assignee: Amit Sela (was: James Malone) > Fill in the documentation/runners/spark portion of the website > -- > > Key: BEAM-507 > URL: https://issues.apache.org/jira/browse/BEAM-507 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Frances Perry >Assignee: Amit Sela > > As per > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit. > Should be a landing page for Spark-specific information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1096) flink streaming side output optimization using SplitStream
[ https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726461#comment-15726461 ] ASF GitHub Bot commented on BEAM-1096: -- GitHub user xhumanoid opened a pull request: https://github.com/apache/incubator-beam/pull/1520 [BEAM-1096] flink streaming side output optimization using SplitStream Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- @aljoscha check please You can merge this pull request into a Git repository by running: $ git pull https://github.com/xhumanoid/incubator-beam stream_side_output_optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1520 commit 568d73f74219a42cac4028b198f53f22a832990f Author: Alexey Diomin <diomi...@gmail.com> Date: 2016-12-06T19:30:54Z [BEAM-1096] flink streaming side output optimization using SplitStream > flink streaming side output optimization using SplitStream > -- > > Key: BEAM-1096 > URL: https://issues.apache.org/jira/browse/BEAM-1096 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Priority: Minor > > Current implementation: > 1) send all events in all output streams > 2) filtering streams for necessary tags > Cons: increased cpu usage for serialization all events > Proposed implementation: > 1) route event in correct streams based on tag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-986) ReduceFnRunner doesn't batch prefetching pane firings
[ https://issues.apache.org/jira/browse/BEAM-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726441#comment-15726441 ] ASF GitHub Bot commented on BEAM-986: - GitHub user scwhittle opened a pull request: https://github.com/apache/incubator-beam/pull/1519 [BEAM-986] Improvements to ReduceFnRunner prefetching - add prefetch* methods for prefetching state matching existing methods - replace onTimer with batched onTimers method to allow prefetching across timers - remove deprecated TimerCallback usage - prefetch triggers in processElements You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwhittle/incubator-beam reduce_fn_prefetching_no_sdk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1519 commit 5085c77736a314519c6dcdb4c4c2c21b425b9668 Author: Sam Whittle <samu...@google.com> Date: 2016-11-10T20:59:49Z Improvements to ReduceFnRunner prefetching: - add prefetch* methods for prefetching state matching existing methods - replace onTimer with batched onTimers method to allow prefetching across timers - prefetch triggers in processElements Additionally remove deprecated TimerCallback usage > ReduceFnRunner doesn't batch prefetching pane firings > - > > Key: BEAM-986 > URL: https://issues.apache.org/jira/browse/BEAM-986 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Sam Whittle >Assignee: Sam Whittle > Original Estimate: 24h > Remaining Estimate: 24h > > Specifically > - in ProcessElements, if there are multiple windows to consider each is > processed sequentially with sequential state fetches instead of a bulk > prefetch > - onTimer method doesn't evaluate multiple timers at a time meaning that if > multiple timers are fired at once each is processed sequentially without > batched prefetching -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1096) flink streaming side output optimization using SplitStream
Alexey Diomin created BEAM-1096: --- Summary: flink streaming side output optimization using SplitStream Key: BEAM-1096 URL: https://issues.apache.org/jira/browse/BEAM-1096 Project: Beam Issue Type: Improvement Components: runner-flink Affects Versions: 0.4.0-incubating Reporter: Alexey Diomin Priority: Minor Current implementation: 1) send all events in all output streams 2) filtering streams for necessary tags Cons: increased cpu usage for serialization all events Proposed implementation: 1) route event in correct streams based on tag -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-646) Get runners out of the apply()
[ https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726352#comment-15726352 ] ASF GitHub Bot commented on BEAM-646: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1495 > Get runners out of the apply() > -- > > Key: BEAM-646 > URL: https://issues.apache.org/jira/browse/BEAM-646 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Thomas Groh > > Right now, the runner intercepts calls to apply() and replaces transforms as > we go. This means that there is no "original" user graph. For portability and > misc architectural benefits, we would like to build the original graph first, > and have the runner override later. > Some runners already work in this manner, but we could integrate it more > smoothly, with more validation, via some handy APIs on e.g. the Pipeline > object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1095) Add support set config for reuse-object on flink
[ https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726285#comment-15726285 ] ASF GitHub Bot commented on BEAM-1095: -- GitHub user xhumanoid opened a pull request: https://github.com/apache/incubator-beam/pull/1518 [BEAM-1095] Add support set config for reuse-object on flink Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- @aljoscha check please You can merge this pull request into a Git repository by running: $ git pull https://github.com/xhumanoid/incubator-beam flink_object_reuse Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1518 commit 8ae24419cd9125aa445b36d736fc4d8bfe2dcb7d Author: Alexey Diomin <diomi...@gmail.com> Date: 2016-12-06T18:20:02Z [BEAM-1095] Add support set config for reuse-object on flink > Add support set config for reuse-object on flink > > > Key: BEAM-1095 > URL: https://issues.apache.org/jira/browse/BEAM-1095 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Affects Versions: 0.4.0-incubating >Reporter: Alexey Diomin >Priority: Trivial > > Object-reuse is dangerous setting and disabled by default, > but sometime we need use this option to omit performance overhead for > serialization-deserialization objects on every transformations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1087) Pickling error in save main session
[ https://issues.apache.org/jira/browse/BEAM-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726281#comment-15726281 ] ASF GitHub Bot commented on BEAM-1087: -- Github user sb2nov closed the pull request at: https://github.com/apache/incubator-beam/pull/1485 > Pickling error in save main session > --- > > Key: BEAM-1087 > URL: https://issues.apache.org/jira/browse/BEAM-1087 > Project: Beam > Issue Type: Bug > Components: sdk-py >Reporter: Sourabh Bajaj >Assignee: Sourabh Bajaj >Priority: Minor > > {code} > File "/usr/local/lib/python2.7/pickle.py", line 286, in save > f(self, obj) # Call unbound method with explicit self > File "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 1231, in > save_type > StockPickler.save_global(pickler, obj) > File "/usr/local/lib/python2.7/pickle.py", line 754, in save_global > (obj, module, name)) > pickle.PicklingError: Can't pickle 'apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum'>: > it's not found as > apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-293) StreamingOptions should not extend GcpOptions
[ https://issues.apache.org/jira/browse/BEAM-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726230#comment-15726230 ] ASF GitHub Bot commented on BEAM-293: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1515 > StreamingOptions should not extend GcpOptions > - > > Key: BEAM-293 > URL: https://issues.apache.org/jira/browse/BEAM-293 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > Labels: backward-incompatible > > Now, the SDK {{StreamingOptions}} extends {{GcpOptions}}: > {code} > StreamingOptions extends ApplicationNameOptions, *GcpOptions*, PipelineOptions > {code} > The core SDK should not depend to GCP generally speaking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1093) Confusing Javadocs in StateInternals
[ https://issues.apache.org/jira/browse/BEAM-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726206#comment-15726206 ] Davor Bonaci commented on BEAM-1093: Just an outdated comment. [~mauzhang], perhaps you can quickly fix it by swapping Dataflow and Beam? > Confusing Javadocs in StateInternals > > > Key: BEAM-1093 > URL: https://issues.apache.org/jira/browse/BEAM-1093 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Manu Zhang >Assignee: Ben Chambers >Priority: Minor > > At last but one line of StateInternals' Javadocs, it says "This is a > low-level API intended for use by the Dataflow SDK". Not sure what is > "Dataflow SDK". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-651) Consider making TypedPValue.setTypeDescriptorInternal no longer Internal
[ https://issues.apache.org/jira/browse/BEAM-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725987#comment-15725987 ] ASF GitHub Bot commented on BEAM-651: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-beam/pull/1516 > Consider making TypedPValue.setTypeDescriptorInternal no longer Internal > > > Key: BEAM-651 > URL: https://issues.apache.org/jira/browse/BEAM-651 > Project: Beam > Issue Type: Wish > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: easy, easyfix, starter > Fix For: 0.4.0-incubating > > > This would give fairly pithy answers to StackOverflow questions sometimes. > When choosing between .getOutputCoder() and .getOutputTypeDescriptor() for a > transform/DoFn we often choose the type, so the coder registry can do its > thing. > This would also give a similar choice between .setCoder(...) and > .setTypeDescriptor(...). > And anyhow we have the intention of removing our practice of the "*Internal" > suffix, so this one might be most easily solved by making it public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-651) Consider making TypedPValue.setTypeDescriptorInternal no longer Internal
[ https://issues.apache.org/jira/browse/BEAM-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-651. -- Resolution: Fixed Fix Version/s: 0.4.0-incubating > Consider making TypedPValue.setTypeDescriptorInternal no longer Internal > > > Key: BEAM-651 > URL: https://issues.apache.org/jira/browse/BEAM-651 > Project: Beam > Issue Type: Wish > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: easy, easyfix, starter > Fix For: 0.4.0-incubating > > > This would give fairly pithy answers to StackOverflow questions sometimes. > When choosing between .getOutputCoder() and .getOutputTypeDescriptor() for a > transform/DoFn we often choose the type, so the coder registry can do its > thing. > This would also give a similar choice between .setCoder(...) and > .setTypeDescriptor(...). > And anyhow we have the intention of removing our practice of the "*Internal" > suffix, so this one might be most easily solved by making it public. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-830) Launcher for ApexRunner execution on YARN cluster
[ https://issues.apache.org/jira/browse/BEAM-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise updated BEAM-830: -- Assignee: Thomas Weise > Launcher for ApexRunner execution on YARN cluster > -- > > Key: BEAM-830 > URL: https://issues.apache.org/jira/browse/BEAM-830 > Project: Beam > Issue Type: Improvement > Components: runner-apex >Reporter: Thomas Weise >Assignee: Thomas Weise > > Currently the ApexRunner only support execution in embedded mode. Add the > support to package the dependencies and run the Apex app on a YARN cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-830) Launcher for ApexRunner execution on YARN cluster
[ https://issues.apache.org/jira/browse/BEAM-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725690#comment-15725690 ] ASF GitHub Bot commented on BEAM-830: - GitHub user tweise opened a pull request: https://github.com/apache/incubator-beam/pull/1517 [BEAM-830] ApexRunner launch on YARN cluster. Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[BEAM-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). --- R: @kennknowles @dhalperi This PR provides the support to run a Beam pipeline through the main method on the YARN cluster. Example: ``` $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner --embeddedExecution=false" -Papex-runner ``` To make that happen it has to do some magic with the class path to determine the dependencies that should be used with the Hadoop client. Would like to get these changes into the upcoming release. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tweise/incubator-beam BEAM-830 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/1517.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1517 commit 1b1336fd903d13a5874f1e6b1d03888f54f0fbff Author: Thomas Weise <t...@apache.org> Date: 2016-11-25T02:36:11Z BEAM-830 Support launch on YARN cluster. > Launcher for ApexRunner execution on YARN cluster > -- > > Key: BEAM-830 > URL: https://issues.apache.org/jira/browse/BEAM-830 > Project: Beam > Issue Type: Improvement > Components: runner-apex >Reporter: Thomas Weise > > Currently the ApexRunner only support execution in embedded mode. Add the > support to package the dependencies and run the Apex app on a YARN cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1093) Confusing Javadocs in StateInternals
[ https://issues.apache.org/jira/browse/BEAM-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated BEAM-1093: - Assignee: Ben Chambers (was: Davor Bonaci) > Confusing Javadocs in StateInternals > > > Key: BEAM-1093 > URL: https://issues.apache.org/jira/browse/BEAM-1093 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Manu Zhang >Assignee: Ben Chambers >Priority: Minor > > At last but one line of StateInternals' Javadocs, it says "This is a > low-level API intended for use by the Dataflow SDK". Not sure what is > "Dataflow SDK". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1093) Confusing Javadocs in StateInternals
Manu Zhang created BEAM-1093: Summary: Confusing Javadocs in StateInternals Key: BEAM-1093 URL: https://issues.apache.org/jira/browse/BEAM-1093 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Manu Zhang Assignee: Davor Bonaci Priority: Minor At last but one line of StateInternals' Javadocs, it says "This is a low-level API intended for use by the Dataflow SDK". Not sure what is "Dataflow SDK". -- This message was sent by Atlassian JIRA (v6.3.4#6332)