[jira] [Commented] (BEAM-25) Add user-ready API for interacting with state

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731043#comment-15731043
 ] 

ASF GitHub Bot commented on BEAM-25:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1543


> Add user-ready API for interacting with state
> -
>
> Key: BEAM-25
> URL: https://issues.apache.org/jira/browse/BEAM-25
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: State
>
> Our current state API is targeted at runner implementers, not pipeline 
> authors. As such it has many capabilities that are not necessary nor 
> desirable for simple use cases of stateful ParDo (such as dynamic state tag 
> creation). Implement a simple state intended for user access.
> (Details of our current thoughts in forthcoming design doc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730970#comment-15730970
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1528


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1109) Python ValidatesRunner Tests on Dataflow Service Timeout

2016-12-07 Thread Mark Liu (JIRA)
Mark Liu created BEAM-1109:
--

 Summary: Python ValidatesRunner Tests on Dataflow Service Timeout
 Key: BEAM-1109
 URL: https://issues.apache.org/jira/browse/BEAM-1109
 Project: Beam
  Issue Type: Bug
  Components: sdk-py, testing
Reporter: Mark Liu
Assignee: Mark Liu


ValidatesRunner tests timeout with following logs:
https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/839/console

Need to increase "--process-timeout" in execution command 
(https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/run_postcommit.sh#L77).





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1096) flink streaming side output optimization using SplitStream

2016-12-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1096:
---
Assignee: Alexey Diomin

> flink streaming side output optimization using SplitStream
> --
>
> Key: BEAM-1096
> URL: https://issues.apache.org/jira/browse/BEAM-1096
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> Current implementation:
> 1) send all events in all output streams
> 2) filtering streams for necessary tags
> Cons: increased cpu usage for serialization all events
> Proposed implementation:
> 1) route event in correct streams based on tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1096) flink streaming side output optimization using SplitStream

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730793#comment-15730793
 ] 

ASF GitHub Bot commented on BEAM-1096:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1520


> flink streaming side output optimization using SplitStream
> --
>
> Key: BEAM-1096
> URL: https://issues.apache.org/jira/browse/BEAM-1096
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Priority: Minor
>
> Current implementation:
> 1) send all events in all output streams
> 2) filtering streams for necessary tags
> Cons: increased cpu usage for serialization all events
> Proposed implementation:
> 1) route event in correct streams based on tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1095) Add support set config for reuse-object on flink

2016-12-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1095:
---
Assignee: Alexey Diomin

> Add support set config for reuse-object on flink
> 
>
> Key: BEAM-1095
> URL: https://issues.apache.org/jira/browse/BEAM-1095
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> Object-reuse is dangerous setting and disabled by default,
> but sometime we need use this option to omit performance overhead for 
> serialization-deserialization objects on every transformations 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1095) Add support set config for reuse-object on flink

2016-12-07 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-1095.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Add support set config for reuse-object on flink
> 
>
> Key: BEAM-1095
> URL: https://issues.apache.org/jira/browse/BEAM-1095
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Assignee: Alexey Diomin
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> Object-reuse is dangerous setting and disabled by default,
> but sometime we need use this option to omit performance overhead for 
> serialization-deserialization objects on every transformations 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1095) Add support set config for reuse-object on flink

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730767#comment-15730767
 ] 

ASF GitHub Bot commented on BEAM-1095:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1518


> Add support set config for reuse-object on flink
> 
>
> Key: BEAM-1095
> URL: https://issues.apache.org/jira/browse/BEAM-1095
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Priority: Trivial
>
> Object-reuse is dangerous setting and disabled by default,
> but sometime we need use this option to omit performance overhead for 
> serialization-deserialization objects on every transformations 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730723#comment-15730723
 ] 

ASF GitHub Bot commented on BEAM-1108:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1546


> Remove deprecated Dataflow Runner options and update documentation
> --
>
> Key: BEAM-1108
> URL: https://issues.apache.org/jira/browse/BEAM-1108
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: Not applicable
>
>
> Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} 
> configurations, plus improving documentation. Will update bug description as 
> more tasks arise.
> 1. Remove the {{TEARDOWN_POLICY}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730375#comment-15730375
 ] 

Aljoscha Krettek commented on BEAM-1107:


Yep, you're right but even in the black text the operation names (MapPartition, 
GroupCombine and so on) are hardcoded in Flink right now so we cannot change 
that coming from Beam-on-Flink. Changing that would require changes to Flink 
(which I'm not opposed to).

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation

2016-12-07 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730373#comment-15730373
 ] 

Neelesh Srinivas Salian commented on BEAM-1108:
---

[~dhalp...@google.com] subtasks would be good to have to spread the work if 
there are multiple parts to this..

> Remove deprecated Dataflow Runner options and update documentation
> --
>
> Key: BEAM-1108
> URL: https://issues.apache.org/jira/browse/BEAM-1108
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: Not applicable
>
>
> Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} 
> configurations, plus improving documentation. Will update bug description as 
> more tasks arise.
> 1. Remove the {{TEARDOWN_POLICY}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730359#comment-15730359
 ] 

ASF GitHub Bot commented on BEAM-646:
-

GitHub user tgroh opened a pull request:

https://github.com/apache/incubator-beam/pull/1547

[BEAM-646] Add PTransformOverrideFactory to the Core SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This migrates PTransformOverrideFactory from the DirectRunner to the
Core SDK, as part of BEAM-646.

Migrate all DirectRunner Override Factories to the new
PTransformOverrideFactory.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgroh/incubator-beam override_factory_in_core

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1547


commit ad7aa03e12694bceb29906d2bb9df1ce009a1df2
Author: Thomas Groh <tg...@google.com>
Date:   2016-12-06T00:01:57Z

Add PTransformOverrideFactory to the Core SDK

This migrates PTransformOverrideFactory from the DirectRunner to the
Core SDK, as part of BEAM-646.

Add getOriginalToReplacements to provide a mapping from the original
outputs to replaced outputs. This enables all replaced nodes to be
rewired to output the original output.

Migrate all DirectRunner Override Factories to the new
PTransformOverrideFactory.




> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730313#comment-15730313
 ] 

Daniel Halperin commented on BEAM-1107:
---

Ack -- I guess I have this intuition there's opportunity for more cleanup, but 
I may be wrong (or it may be a Flink-general, not Beam-on-Flink issue).

E.g., look at the attached screenshot:

* The name (grey) at the top is MapPartition -> Map -> GroupCombine -> Map
* The name of the steps (black) includes the identical as the grey, with 
additionally (step name)
* The Operation: text (small, grey) at the bottom includes the same (almost - 
logical vs physical?) information, although there appears to be some HTML error 
with inserting a break tag inside another break tag.


> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1107:
--
Attachment: screenshot-1.png

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
> Attachments: screenshot-1.png
>
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730289#comment-15730289
 ] 

ASF GitHub Bot commented on BEAM-551:
-

GitHub user sammcveety opened a pull request:

https://github.com/apache/incubator-beam/pull/1545

[BEAM-551] Fix handling of TextIO.Sink

R: @dhalperi 

Directory needs to be parameterized.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sammcveety/incubator-beam sgmc/text_io_write

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1545


commit d017fde2d063765a73b290e1b1e1b849f147910f
Author: Sam McVeety <s...@google.com>
Date:   2016-12-07T21:27:53Z

[BEAM-551] Fix handling of TextIO.Sink

commit 9309c9389d9e9fa2cae3f7378692d0484ddc54b2
Author: Sam McVeety <s...@google.com>
Date:   2016-12-07T22:09:41Z

Fix test




> Support Dynamic PipelineOptions
> ---
>
> Key: BEAM-551
> URL: https://issues.apache.org/jira/browse/BEAM-551
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Sam McVeety
>Assignee: Sam McVeety
>Priority: Minor
>
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730212#comment-15730212
 ] 

Daniel Halperin commented on BEAM-1107:
---

Also copying [~aljoscha]'s response :)

{quote}
I think we can get it down to "Data Source (ReadLines/Read)" (and similarly for 
other operators). The problem is that the String parameter is not the correct 
way to set the name of the operator but some other (admittedly weird) thing 
called "location name". To set the name we have to call .name(String) on the 
created operator after creating it.
{quote}

> Display user names for steps in the Flink Web UI
> 
>
> Key: BEAM-1107
> URL: https://issues.apache.org/jira/browse/BEAM-1107
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
>
> [copying in-person / email discussion at Strata Singapore to JIRA]
> The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
> "SDK name" for the transform.
> The "user name" for the transform is not available here, it is in fact on the 
> TransformHierarchy.Node as node.getFullName() [2].
> getFullName() is used some in Flink, but not when setting step names.
> I drafted a quick commit that sort of propagates the user names to the web UI 
> (but only for DataSource, and still too verbose: 
> https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)
> Before this change, the "ReadLines" step showed up as: "DataSource (at 
> Read(CompressedSource) 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> With this change, it shows up as "DataSource (at ReadLines/Read 
> (org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"
> which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].
> Thoughts?
> [1] 
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
> [2] 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1107) Display user names for steps in the Flink Web UI

2016-12-07 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1107:
-

 Summary: Display user names for steps in the Flink Web UI
 Key: BEAM-1107
 URL: https://issues.apache.org/jira/browse/BEAM-1107
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Daniel Halperin
Assignee: Aljoscha Krettek


[copying in-person / email discussion at Strata Singapore to JIRA]


The FlinkBatchTransformTranslators use transform.getName() [1] -- this is the 
"SDK name" for the transform.

The "user name" for the transform is not available here, it is in fact on the 
TransformHierarchy.Node as node.getFullName() [2].

getFullName() is used some in Flink, but not when setting step names.

I drafted a quick commit that sort of propagates the user names to the web UI 
(but only for DataSource, and still too verbose: 
https://github.com/dhalperi/incubator-beam/commit/a2f1fb06b22a85ec738e4f2a604c9a129891916c)

Before this change, the "ReadLines" step showed up as: "DataSource (at 
Read(CompressedSource) 
(org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"

With this change, it shows up as "DataSource (at ReadLines/Read 
(org.apache.beam.runners.flink.translation.wrappers.SourceInputFormat))"

which I think is closer. [I'd still like it to JUST be "ReadLines/Read" e.g.].

Thoughts?


[1] 
https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkBatchTransformTranslators.java#L129
[2] 
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/runners/TransformHierarchy.java#L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-905) Archetype pom needs to generalize dependencies

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730204#comment-15730204
 ] 

ASF GitHub Bot commented on BEAM-905:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1533


> Archetype pom needs to generalize dependencies
> --
>
> Key: BEAM-905
> URL: https://issues.apache.org/jira/browse/BEAM-905
> Project: Beam
>  Issue Type: Bug
>Affects Versions: 0.4.0-incubating
> Environment: Currently the archetype pom includes the direct runner 
> and the dataflow one, but not the others. It should do the same magic as the 
> main examples.
>Reporter: Frances Perry
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-975) Issue with MongoDBIO

2016-12-07 Thread Reza Nouri (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730202#comment-15730202
 ] 

Reza Nouri commented on BEAM-975:
-

Hey [~jbonofre],

Here is from mongo log before failure:

2016-12-08T09:42:36.346+1100 I NETWORK  [initandlisten] Listener: accept() 
returns -1 errno:24 Too many open files
2016-12-08T09:42:36.346+1100 E NETWORK  [initandlisten] Out of file 
descriptors. Waiting one second before trying to accept more connections.
2016-12-08T09:42:37.743+1100 E STORAGE  [thread1] WiredTiger (24) 
[1481150557:743579][1523:0x75b05000], log-server: data/db/journal: 
directory-list: opendir: Too many open files
2016-12-08T09:42:37.743+1100 E STORAGE  [thread1] WiredTiger (24) 
[1481150557:743728][1523:0x75b05000], log-server: journal: directory-list, 
prefix "WiredTigerPreplog": Too many open files
2016-12-08T09:42:37.743+1100 E STORAGE  [thread1] WiredTiger (24) 
[1481150557:743750][1523:0x75b05000], log-server: log pre-alloc server 
error: Too many open files
2016-12-08T09:42:37.743+1100 E STORAGE  [thread1] WiredTiger (24) 
[1481150557:743768][1523:0x75b05000], log-server: log server error: Too 
many open files
2016-12-08T09:42:47.005+1100 W FTDC [ftdc] Uncaught exception in 
'FileNotOpen: Failed to open interim file 
data/db/diagnostic.data/metrics.interim.temp' in full-time diagnostic data 
capture subsystem. Shutting down the full-time diagnostic data capture 
subsystem.
2016-12-08T09:43:27.758+1100 I NETWORK  [initandlisten] Listener: accept() 
returns -1 errno:24 Too many open files
2016-12-08T09:43:27.758+1100 E NETWORK  [initandlisten] Out of file 
descriptors. Waiting one second before trying to accept more connections.
2016-12-08T09:43:28.635+1100 W NETWORK  [HostnameCanonicalizationWorker] Failed 
to obtain address information for hostname dyn: nodename nor servname provided, 
or not known
2016-12-08T09:43:28.759+1100 I NETWORK  [initandlisten] Listener: accept() 
returns -1 errno:24 Too many open files
2016-12-08T09:43:28.759+1100 E NETWORK  [initandlisten] Out of file 
descriptors. Waiting one second before trying to accept more connections.
2016-12-08T09:43:29.021+1100 E STORAGE  [thread2] WiredTiger (24) 
[1481150609:20956][1523:0x75c8e000], file:WiredTiger.wt, 
WT_SESSION.checkpoint: data/db/WiredTiger.turtle: handle-open: open: Too many 
open files
2016-12-08T09:43:29.021+1100 E STORAGE  [thread2] WiredTiger (24) 
[1481150609:21326][1523:0x75c8e000], checkpoint-server: checkpoint server 
error: Too many open files
2016-12-08T09:43:29.021+1100 E STORAGE  [thread2] WiredTiger (-31804) 
[1481150609:21355][1523:0x75c8e000], checkpoint-server: the process must 
exit and restart: WT_PANIC: WiredTiger library panic
2016-12-08T09:43:29.021+1100 I -[thread2] Fatal Assertion 28558
2016-12-08T09:43:29.021+1100 I -[thread2] 

***aborting after fassert() failure


2016-12-08T09:43:29.029+1100 F -[thread2] Got signal: 6 (Abort trap: 6).

And then it throws connection timeout exception:

SEVERE: Servlet.service() for servlet [Curation] in context with path [] threw 
exception [org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a 
server that matches ReadPreferenceServerSelector{readPreference=primary}. 
Client view of cluster state is {type=UNKNOWN, 
servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING, 
exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, 
caused by {java.net.ConnectException: Connection refused}}]] with root cause
com.mongodb.MongoTimeoutException: Timed out after 3 ms while waiting for a 
server that matches ReadPreferenceServerSelector{readPreference=primary}. 
Client view of cluster state is {type=UNKNOWN, 
servers=[{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING, 
exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, 
caused by {java.net.ConnectException: Connection refused}}]
at 
com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:369)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:101)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:75)
at 
com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.(ClusterBinding.java:71)
at 
com.mongodb.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:63)
at 
com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:89)
at 
com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:84)
at 
com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:55)
at com.mongodb.Mongo.execute(Mongo.java:772)
at com.mongodb.Mongo$2.execute(

[jira] [Commented] (BEAM-25) Add user-ready API for interacting with state

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730142#comment-15730142
 ] 

ASF GitHub Bot commented on BEAM-25:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1543

[BEAM-25] Move CopyOnAccessStateInternals to runners/direct

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @tgroh this is only actually used by the direct runner. Not necessarily 
the greatest JIRA for this, but I'm not sure of a blanket for cleaning out the 
SDK's excessive surface area, so I went with the state ticket.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
CopyOnAccessStateInternals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1543


commit 019612ba5e5a656e848d458617007d39be42b3e9
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T22:28:39Z

Move CopyOnAccessStateInternals to runners/direct




> Add user-ready API for interacting with state
> -
>
> Key: BEAM-25
> URL: https://issues.apache.org/jira/browse/BEAM-25
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: State
>
> Our current state API is targeted at runner implementers, not pipeline 
> authors. As such it has many capabilities that are not necessary nor 
> desirable for simple use cases of stateful ParDo (such as dynamic state tag 
> creation). Implement a simple state intended for user access.
> (Details of our current thoughts in forthcoming design doc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730125#comment-15730125
 ] 

ASF GitHub Bot commented on BEAM-498:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1541

[BEAM-498] Move PerKeyCombineFnRunner to runners-core

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @peihe seems there is no reason this needs to stay, right? 
`PerKeyCombineFnRunners` (the static util class) is already moved.

R: @lukecwik this has some references to `OldDoFn` so at least this gets 
them out of the SDK.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
PerKeyCombineFnRunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1541


commit 351f58567212e7a9d4664ca65a0bd4a10a77ed81
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T22:22:21Z

Move PerKeyCombineFnRunner to runners-core




> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730120#comment-15730120
 ] 

ASF GitHub Bot commented on BEAM-498:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1539

[BEAM-498] Remove misc occurrences of OldDoFn

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @lukecwik here's a few more

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam remove-OldDoFn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1539


commit 4c178c4dbfbdbcb5e5e3100a14807be7dfda62be
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T22:17:01Z

Remove misc occurrences of OldDoFn




> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-507) Fill in the documentation/runners/spark portion of the website

2016-12-07 Thread Amit Sela (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730054#comment-15730054
 ] 

Amit Sela commented on BEAM-507:


I'll have a PR by tomorrow.

> Fill in the documentation/runners/spark portion of the website
> --
>
> Key: BEAM-507
> URL: https://issues.apache.org/jira/browse/BEAM-507
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Amit Sela
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for Spark-specific information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-438) Rename one of PTransform.apply and PInput.apply

2016-12-07 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-438:


Assignee: Kenneth Knowles

> Rename one of PTransform.apply and PInput.apply
> ---
>
> Key: BEAM-438
> URL: https://issues.apache.org/jira/browse/BEAM-438
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>
> Before releasing Beam 1.0, we should do this.
> Right now, it's legal to call:
> {{ptransform.apply(input)}}
> and 
> {{input.apply(ptransform)}}
> when only the latter is correct. The former skips various validation steps 
> and loses the notion of composite transforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-438) Rename one of PTransform.apply and PInput.apply

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729977#comment-15729977
 ] 

ASF GitHub Bot commented on BEAM-438:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1538

[BEAM-438] Rename PTransform.apply to PTransform.expand

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Opening this to have a code pointer in dev list discussion.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam PTransform-expand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1538.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1538


commit ff50e4d470d954c17d73358aa3e0c4c8b4123b87
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T21:33:04Z

Rename PTransform.apply to PTransform.expand




> Rename one of PTransform.apply and PInput.apply
> ---
>
> Key: BEAM-438
> URL: https://issues.apache.org/jira/browse/BEAM-438
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>  Labels: backward-incompatible
>
> Before releasing Beam 1.0, we should do this.
> Right now, it's legal to call:
> {{ptransform.apply(input)}}
> and 
> {{input.apply(ptransform)}}
> when only the latter is correct. The former skips various validation steps 
> and loses the notion of composite transforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1090) High memory usage error

2016-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729950#comment-15729950
 ] 

María GH commented on BEAM-1090:


I had another occurrence: 
https://travis-ci.org/apache/incubator-beam/jobs/181976745

> High memory usage error
> ---
>
> Key: BEAM-1090
> URL: https://issues.apache.org/jira/browse/BEAM-1090
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Affects Versions: 0.3.0-incubating
>Reporter: María GH
>Priority: Minor
>
> Non-reproducible high memory usage test failure. It goes away on its own.
> RuntimeError: High memory usage: 201418866688 > 201008464768 [while running 
> 'oom:check']
> root: WARNING: A task failed with exception.
>  High memory usage: 201418866688 > 201008464768 [while running 'oom:check']
> ---
> Complete results at https://travis-ci.org/apache/incubator-beam/jobs/181011669



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-597) Provide type information from Coders

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729854#comment-15729854
 ] 

ASF GitHub Bot commented on BEAM-597:
-

GitHub user jeremiele opened a pull request:

https://github.com/apache/incubator-beam/pull/1537

[BEAM-597] Added a new method on Coder which returns a TypeDescriptor.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The new method allows returning type information about the data being
encoded and decoded by a Coder.
Added a default implementation to StandardCoder which returns the
TypeDescriptor for Object to ease the transition and avoid breaking
implementations relying on StandardCoder or AtomicCoder.

This will break classes implementing the Coder interface directly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeremiele/incubator-beam 
add_method_to_coder_for_typedescriptor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1537


commit 4a892401689fd01a831ff8904e46f917f8f1
Author: Jeremie Lenfant-Engelmann <jeremi...@google.com>
Date:   2016-12-07T19:29:29Z

Added a new method on Coder which returns a TypeDescriptor.

The new method allows returning type information about the data being
encoded and decoded by a Coder.
Added a default implementation to StandardCoder which returns the
TypeDescriptor for Object to ease the transition and avoid breaking
implementations relying on StandardCoder or AtomicCoder.

This will break classes implementing the Coder interface directly.




> Provide type information from Coders
> 
>
> Key: BEAM-597
> URL: https://issues.apache.org/jira/browse/BEAM-597
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jeremie Lenfant-Engelmann
>Assignee: Jeremie Lenfant-Engelmann
>Priority: Minor
>
> The Coder interface should return a TypeDescriptor describing the type that 
> is currently encoded/decoded by the Coder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-884) Add Display Data to the Python SDK's PipelineOptions, Avro io and other transforms

2016-12-07 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729725#comment-15729725
 ] 

Pablo Estrada commented on BEAM-884:


This is done.

> Add Display Data to the Python SDK's PipelineOptions, Avro io and other 
> transforms
> --
>
> Key: BEAM-884
> URL: https://issues.apache.org/jira/browse/BEAM-884
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Pablo Estrada
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1106) Remove no_pipeline_type_check flag from Python SDK

2016-12-07 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-1106:
---

 Summary: Remove no_pipeline_type_check flag from Python SDK
 Key: BEAM-1106
 URL: https://issues.apache.org/jira/browse/BEAM-1106
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Pablo Estrada
Assignee: Frances Perry


It's already the default behavior. It should be possible to remove it without 
trouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-824) Misleading error message when sdk_location is missing in python

2016-12-07 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729724#comment-15729724
 ] 

Pablo Estrada commented on BEAM-824:


This was fixed.

> Misleading error message when sdk_location is missing in python
> ---
>
> Key: BEAM-824
> URL: https://issues.apache.org/jira/browse/BEAM-824
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Pablo Estrada
>
> When trying to submit jobs to the Cloud Dataflow service using the Python 
> SDK, the sdk_location should be provided or the serive errors out saying that 
> package google-cloud-dataflow is missing.
> We might want to prompt users to add sdk_location parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1055) Display Data keys on Python are inconsistent

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729720#comment-15729720
 ] 

ASF GitHub Bot commented on BEAM-1055:
--

Github user pabloem closed the pull request at:

https://github.com/apache/incubator-beam/pull/1443


> Display Data keys on Python are inconsistent
> 
>
> Key: BEAM-1055
> URL: https://issues.apache.org/jira/browse/BEAM-1055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>
> Some are in camelCase, some are in snake_case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-722) Add Display Data to the Python SDK

2016-12-07 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729723#comment-15729723
 ] 

Pablo Estrada commented on BEAM-722:


This is done.

> Add Display Data to the Python SDK
> --
>
> Key: BEAM-722
> URL: https://issues.apache.org/jira/browse/BEAM-722
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Frances Perry
>
> The DisplayData feature has been added to the Java SDK (see blog post 
> announcing it: 
> https://cloud.google.com/blog/big-data/2016/06/dataflow-updates-see-more-details-about-your-pipelines).
>  We need now to add it to the Python SDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1055) Display Data keys on Python are inconsistent

2016-12-07 Thread Pablo Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729721#comment-15729721
 ] 

Pablo Estrada commented on BEAM-1055:
-

This is done.

> Display Data keys on Python are inconsistent
> 
>
> Key: BEAM-1055
> URL: https://issues.apache.org/jira/browse/BEAM-1055
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>
> Some are in camelCase, some are in snake_case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.

2016-12-07 Thread Jesse Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729610#comment-15729610
 ] 

Jesse Anderson commented on BEAM-1105:
--

Sounds good to me.

> Adding Beam's pico Wordcount to the existing examples. 
> ---
>
> Key: BEAM-1105
> URL: https://issues.apache.org/jira/browse/BEAM-1105
> Project: Beam
>  Issue Type: Wish
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: examples
>
> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> Is a good explanation for the WordCount that would encourage users.
> Adding this to the examples and subsequently the docs is a good step to help 
> new users start from a good foundation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1065) FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729597#comment-15729597
 ] 

ASF GitHub Bot commented on BEAM-1065:
--

Github user peihe closed the pull request at:

https://github.com/apache/incubator-beam/pull/1426


> FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)
> --
>
> Key: BEAM-1065
> URL: https://issues.apache.org/jira/browse/BEAM-1065
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>
> FileBasedReader should be able to open the file with the 
> Source.getStartOffset(), and then read forward to find the first input 
> element.
> The benefits are:
> 1. It is easier to implement a ReadableByteChannel.
> 2. Dynamically splitting won't require file systems to support seeking.
> 3. Doesn't need to seek to position twice, which is what current API does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-664) Port Dataflow SDK WordCount walkthrough to Beam site

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729567#comment-15729567
 ] 

ASF GitHub Bot commented on BEAM-664:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1536

[BEAM-664] Revise WindowedWordCount example to be more independent of 
runner and execution style

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This removes the use of BigQuery from the WindowedWordCount example, 
replacing it with a somewhat hacky file-based write of the output, using the 
window as the idempotency key. In order to port the test and to benefit from 
recent improvements in `FileBasedCheckSumMatcher`, I've factored out the 
resiliency code from that into an internal-only minimal `ShardedFile` class 
with just enough API surface to write these tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam WindowedWordCount

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1536


commit bac7b192c9fefdf536e324f1fde73bac2cd903fa
Author: Kenneth Knowles <k...@google.com>
Date:   2016-11-04T03:44:45Z

Add IntervalWindow coder to the standard registry

commit aa09449379dfaedbdfb0b73bae85ffd334b838b1
Author: Kenneth Knowles <k...@google.com>
Date:   2016-11-03T21:37:26Z

Revise WindowedWordCount for runner and execution mode portability

commit 357732efb866e4a24d76a9aaafa10e6bc964fe7e
Author: Kenneth Knowles <k...@google.com>
Date:   2016-11-08T06:06:00Z

Check the onSuccessMatcher in DirectRunner if isBlockOnRun is set

commit 76d19716829cb98c4f0d4b34244fde8866b6d6a9
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-05T22:32:12Z

Factor out ShardedFile from FileChecksumMatcher




> Port Dataflow SDK WordCount walkthrough to Beam site
> 
>
> Key: BEAM-664
> URL: https://issues.apache.org/jira/browse/BEAM-664
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Hadar Hod
>Assignee: Hadar Hod
>
> Port the WordCount walkthrough from Dataflow docs to Beam website. 
> * Copy prose (translate from html to md, remove Dataflow references, etc)
> * Add accurate "How to Run" instructions for each of the WC examples
> * Include code snippets from real examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.

2016-12-07 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729486#comment-15729486
 ] 

Neelesh Srinivas Salian commented on BEAM-1105:
---

[~jbonofre] I agree. Makes sense to bundle them up together.
I'll start a thread and we can think it over. I'm thinking a user - specific 
examples bundle / module would be best to gather all of them.

Will see if others have ideas.

> Adding Beam's pico Wordcount to the existing examples. 
> ---
>
> Key: BEAM-1105
> URL: https://issues.apache.org/jira/browse/BEAM-1105
> Project: Beam
>  Issue Type: Wish
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: examples
>
> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> Is a good explanation for the WordCount that would encourage users.
> Adding this to the examples and subsequently the docs is a good step to help 
> new users start from a good foundation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.

2016-12-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729476#comment-15729476
 ] 

Jean-Baptiste Onofré commented on BEAM-1105:


FYI, I also have concrete samples here: https://github.com/jbonofre/beam-samples

I think we should have a discussion about that we provide.

Currently, the examples are also used to test the runners.

IMHO, Jesse's example is a sample, end-user oriented. So, it's a good candidate 
to be gather with beam-samples all together.

> Adding Beam's pico Wordcount to the existing examples. 
> ---
>
> Key: BEAM-1105
> URL: https://issues.apache.org/jira/browse/BEAM-1105
> Project: Beam
>  Issue Type: Wish
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: examples
>
> http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
> Is a good explanation for the WordCount that would encourage users.
> Adding this to the examples and subsequently the docs is a good step to help 
> new users start from a good foundation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1105) Adding Beam's pico Wordcount to the existing examples.

2016-12-07 Thread Neelesh Srinivas Salian (JIRA)
Neelesh Srinivas Salian created BEAM-1105:
-

 Summary: Adding Beam's pico Wordcount to the existing examples. 
 Key: BEAM-1105
 URL: https://issues.apache.org/jira/browse/BEAM-1105
 Project: Beam
  Issue Type: Wish
Reporter: Neelesh Srinivas Salian
Assignee: Neelesh Srinivas Salian
Priority: Minor


http://www.jesse-anderson.com/2016/12/beams-pico-wordcount/
Is a good explanation for the WordCount that would encourage users.

Adding this to the examples and subsequently the docs is a good step to help 
new users start from a good foundation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729312#comment-15729312
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user tgroh closed the pull request at:

https://github.com/apache/incubator-beam/pull/1442


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-115) Beam Runner API

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729306#comment-15729306
 ] 

ASF GitHub Bot commented on BEAM-115:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1511


> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729262#comment-15729262
 ] 

ASF GitHub Bot commented on BEAM-498:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1529


> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729237#comment-15729237
 ] 

ASF GitHub Bot commented on BEAM-498:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1527


> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-07 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729017#comment-15729017
 ] 

Maximilian Michels commented on BEAM-1092:
--

2) is for avoiding user classes to clash with all SDK-related parts.

1) is for avoiding user classes to clash with other libraries added to the 
classpath which do not properly shade common libraries (e.g. Hadoop)

The archetype does not become more complicated because of 1). It should be more 
or less be transparent to the user who uses the archetype. 

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1104) WordCount: Metrics error in the DirectRunner

2016-12-07 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-1104:
--
Assignee: Thomas Groh  (was: Davor Bonaci)

> WordCount: Metrics error in the DirectRunner
> 
>
> Key: BEAM-1104
> URL: https://issues.apache.org/jira/browse/BEAM-1104
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Daniel Halperin
>Assignee: Thomas Groh
>
> I'm following the Beam quickstart to analyze the pom.xml for the examples 
> archetype in the DirectRunner:
> Generate the project:
> {code}
> mvn archetype:generate \
>   
> -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots 
> \  
>   -DarchetypeGroupId=org.apache.beam \
>   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>   -DarchetypeVersion=LATEST \
>   -DgroupId=org.example \
>   -DartifactId=word-count-beam \
>   -Dversion="0.1" \
>   -Dpackage=org.apache.beam.examples \
>   -DinteractiveMode=false
> {code}
> Count words in the pom.xml:
> {code}
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner
> {code}
> The logs:
> {code}
> INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern pom.xml
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
> getCurrentContainer
> SEVERE: Unable to update metrics on the current thread. Most likely caused by 
> using metrics outside the managed work-execution thread.
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement
> INFO: Initializing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement
> INFO: Finalizing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a.
> {code}
> Presumably, this {{SEVERE}} warning is indicative of a bug (or should be 
> masked).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1104) WordCount: Metrics error in the DirectRunner

2016-12-07 Thread Daniel Halperin (JIRA)
Daniel Halperin created BEAM-1104:
-

 Summary: WordCount: Metrics error in the DirectRunner
 Key: BEAM-1104
 URL: https://issues.apache.org/jira/browse/BEAM-1104
 Project: Beam
  Issue Type: Bug
  Components: runner-direct
Reporter: Daniel Halperin
Assignee: Davor Bonaci


I'm following the Beam quickstart to analyze the pom.xml for the examples 
archetype in the DirectRunner:

Generate the project:

{code}
mvn archetype:generate \
  
-DarchetypeRepository=https://repository.apache.org/content/groups/snapshots \  

  -DarchetypeGroupId=org.apache.beam \
  -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
  -DarchetypeVersion=LATEST \
  -DgroupId=org.example \
  -DartifactId=word-count-beam \
  -Dversion="0.1" \
  -Dpackage=org.apache.beam.examples \
  -DinteractiveMode=false
{code}

Count words in the pom.xml:

{code}
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
 -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner
{code}

The logs:

{code}
INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource expandFilePattern
INFO: Matched 1 files for pattern pom.xml
Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
getCurrentContainer
SEVERE: Unable to update metrics on the current thread. Most likely caused by 
using metrics outside the managed work-execution thread.
Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement
INFO: Initializing write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
processElement
INFO: Opening writer for write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement
INFO: Finalizing write operation 
org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a.
{code}

Presumably, this {{SEVERE}} warning is indicative of a bug (or should be 
masked).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-329) Update Spark runner README.

2016-12-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré resolved BEAM-329.
---
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Update Spark runner README.
> ---
>
> Key: BEAM-329
> URL: https://issues.apache.org/jira/browse/BEAM-329
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.2.0-incubating
>Reporter: Amit Sela
>Assignee: Amit Sela
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> The Spark runner should have a proper WordCount (or something more fancy) 
> example in the README. This is a bit problematic as Beam is lacking HDFS 
> support in TextIO. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-329) Update Spark runner README.

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728575#comment-15728575
 ] 

ASF GitHub Bot commented on BEAM-329:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1532


> Update Spark runner README.
> ---
>
> Key: BEAM-329
> URL: https://issues.apache.org/jira/browse/BEAM-329
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.2.0-incubating
>Reporter: Amit Sela
>Assignee: Amit Sela
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> The Spark runner should have a proper WordCount (or something more fancy) 
> example in the README. This is a bit problematic as Beam is lacking HDFS 
> support in TextIO. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-329) Update Spark runner README.

2016-12-07 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela updated BEAM-329:
---
Summary: Update Spark runner README.  (was: Spark runner README should have 
a proper batch example.)

> Update Spark runner README.
> ---
>
> Key: BEAM-329
> URL: https://issues.apache.org/jira/browse/BEAM-329
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.2.0-incubating
>Reporter: Amit Sela
>Assignee: Amit Sela
>Priority: Minor
>
> The Spark runner should have a proper WordCount (or something more fancy) 
> example in the README. This is a bit problematic as Beam is lacking HDFS 
> support in TextIO. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-329) Spark runner README should have a proper batch example.

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728336#comment-15728336
 ] 

ASF GitHub Bot commented on BEAM-329:
-

GitHub user amitsela opened a pull request:

https://github.com/apache/incubator-beam/pull/1532

[BEAM-329] Spark runner README should have a proper batch example.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amitsela/incubator-beam BEAM-329

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1532


commit c50fe89a071f59740b6f5bd90e1984ca3159162f
Author: Sela <ans...@paypal.com>
Date:   2016-12-07T09:20:07Z

[BEAM-329] Spark runner README should have a proper batch example.




> Spark runner README should have a proper batch example.
> ---
>
> Key: BEAM-329
> URL: https://issues.apache.org/jira/browse/BEAM-329
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 0.2.0-incubating
>Reporter: Amit Sela
>Assignee: Amit Sela
>Priority: Minor
>
> The Spark runner should have a proper WordCount (or something more fancy) 
> example in the README. This is a bit problematic as Beam is lacking HDFS 
> support in TextIO. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1094) Spark runner should define Kafka IO dependency with test scope

2016-12-07 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela resolved BEAM-1094.
-
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Spark runner should define Kafka IO dependency with test scope
> --
>
> Key: BEAM-1094
> URL: https://issues.apache.org/jira/browse/BEAM-1094
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
> Fix For: 0.4.0-incubating
>
>
> Spark runner uses Kafka IO for testing purpose. However, the Kafka IO 
> dependency is with compile scope whereas it should be test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1094) Spark runner should define Kafka IO dependency with test scope

2016-12-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728169#comment-15728169
 ] 

ASF GitHub Bot commented on BEAM-1094:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1531


> Spark runner should define Kafka IO dependency with test scope
> --
>
> Key: BEAM-1094
> URL: https://issues.apache.org/jira/browse/BEAM-1094
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Spark runner uses Kafka IO for testing purpose. However, the Kafka IO 
> dependency is with compile scope whereas it should be test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1094) Spark runner should define Kafka IO dependency with test scope

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728014#comment-15728014
 ] 

ASF GitHub Bot commented on BEAM-1094:
--

GitHub user jbonofre opened a pull request:

https://github.com/apache/incubator-beam/pull/1531

[BEAM-1094] Set test scope for Kafka IO and junit

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [X] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jbonofre/incubator-beam BEAM-1094

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1531.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1531


commit 346c0b528297ab39bfa021ee052dcee48f56953d
Author: Jean-Baptiste Onofré <jbono...@apache.org>
Date:   2016-12-07T07:37:33Z

[BEAM-1094] Set test scope for Kafka IO and junit




> Spark runner should define Kafka IO dependency with test scope
> --
>
> Key: BEAM-1094
> URL: https://issues.apache.org/jira/browse/BEAM-1094
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Spark runner uses Kafka IO for testing purpose. However, the Kafka IO 
> dependency is with compile scope whereas it should be test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1102) Flink Batch Runner does not populate aggregator values

2016-12-06 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed BEAM-1102.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Flink Batch Runner does not populate aggregator values
> --
>
> Key: BEAM-1102
> URL: https://issues.apache.org/jira/browse/BEAM-1102
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.3.0-incubating
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.4.0-incubating
>
>
> Running the quickstart gives 0 for emptyLines.
> Running with {{--streaming=true}} gives the correct value (for my input file, 
> the default examples archetype {{pom.xml}}, the true value is 27 at the time 
> of writing).
> Streaming output:
> {code}
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToLateness : 0
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 27
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToClosedWindow : 0
> {code}
> Non-streaming output:
> {code}
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 0
> {code}
> (Note also that the lateness etc. aggregators are missing entirely, may be 
> expected).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1103) Add Tests For Aggregators in Flink Runner

2016-12-06 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created BEAM-1103:
--

 Summary: Add Tests For Aggregators in Flink Runner
 Key: BEAM-1103
 URL: https://issues.apache.org/jira/browse/BEAM-1103
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Aljoscha Krettek


We currently don't have tests that verify that aggregator values are correctly 
forwarded to Flink.

They didn't work correctly in the Batch Flink runner, as seen in BEAM-1102.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1102) Flink Batch Runner does not populate aggregator values

2016-12-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727742#comment-15727742
 ] 

Aljoscha Krettek commented on BEAM-1102:


The problem is this part in {{FlinkProcessContextBase}}:

{code}
  @Override
  protected <AggInputT, AggOutputT> Aggregator<AggInputT, AggOutputT>
  createAggregatorInternal(String name, Combine.CombineFn<AggInputT, ?, 
AggOutputT> combiner) {
SerializableFnAggregatorWrapper<AggInputT, AggOutputT> wrapper =
new SerializableFnAggregatorWrapper<>(combiner);
Accumulator existingAccum =
(Accumulator<AggInputT, Serializable>) 
runtimeContext.getAccumulator(name);
if (existingAccum != null) {
  return wrapper;
} else {
  runtimeContext.addAccumulator(name, wrapper);
}
return wrapper;
  }
{code}

Notice how the newly created wrapper is returned if the accumulator already 
exists.

> Flink Batch Runner does not populate aggregator values
> --
>
> Key: BEAM-1102
> URL: https://issues.apache.org/jira/browse/BEAM-1102
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.3.0-incubating
>Reporter: Daniel Halperin
>Assignee: Aljoscha Krettek
>Priority: Minor
>
> Running the quickstart gives 0 for emptyLines.
> Running with {{--streaming=true}} gives the correct value (for my input file, 
> the default examples archetype {{pom.xml}}, the true value is 27 at the time 
> of writing).
> Streaming output:
> {code}
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToLateness : 0
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 27
> Dec 07, 2016 1:00:53 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: DroppedDueToClosedWindow : 0
> {code}
> Non-streaming output:
> {code}
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: Final aggregator values:
> Dec 07, 2016 12:58:07 PM org.apache.beam.runners.flink.FlinkRunner run
> INFO: emptyLines : 0
> {code}
> (Note also that the lateness etc. aggregators are missing entirely, may be 
> expected).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-812) Shade guava in beam-sdks-java-io-google-cloud-platform

2016-12-06 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727720#comment-15727720
 ] 

Daniel Halperin commented on BEAM-812:
--

This is not likely to be fixed in the 0.4.0 release because Bigtable has not 
yet pushed a release of their jar with the API surface cleanup integrated.

> Shade guava in beam-sdks-java-io-google-cloud-platform
> --
>
> Key: BEAM-812
> URL: https://issues.apache.org/jira/browse/BEAM-812
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> Looking at 0.3.0-incubating RC1, we are not properly shading Guava.
> https://repository.apache.org/content/repositories/staging/org/apache/beam/beam-sdks-java-io-google-cloud-platform/0.3.0-incubating/beam-sdks-java-io-google-cloud-platform-0.3.0-incubating.pom
> has 
> {code}
> 
>   com.google.guava
>   guava
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-812) Shade guava in beam-sdks-java-io-google-cloud-platform

2016-12-06 Thread Daniel Halperin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Halperin updated BEAM-812:
-
Fix Version/s: (was: 0.4.0-incubating)

> Shade guava in beam-sdks-java-io-google-cloud-platform
> --
>
> Key: BEAM-812
> URL: https://issues.apache.org/jira/browse/BEAM-812
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>
> Looking at 0.3.0-incubating RC1, we are not properly shading Guava.
> https://repository.apache.org/content/repositories/staging/org/apache/beam/beam-sdks-java-io-google-cloud-platform/0.3.0-incubating/beam-sdks-java-io-google-cloud-platform-0.3.0-incubating.pom
> has 
> {code}
> 
>   com.google.guava
>   guava
>  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727629#comment-15727629
 ] 

ASF GitHub Bot commented on BEAM-27:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1528

[BEAM-27] Add DoFn.OnTimerContext and support as a DoFn parameter

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam OnTimerContext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1528


commit 44a7b915d502c318ffabaa6fb808207bb3ea15e8
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T04:10:06Z

Add DoFn.OnTimerContext

commit 1934b704411fed76a58cbc657c7dc8be666c3885
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T04:01:18Z

Add support for OnTimerContext parameter in DoFnSignature

commit 14d9a3794c4f8c68626075343a1dec1d4c017686
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-07T04:10:21Z

Access to OnTimerContext via DoFnInvokers.ArgumentProvider




> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
>     URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727614#comment-15727614
 ] 

ASF GitHub Bot commented on BEAM-498:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1527

[BEAM-498] Port most of DoFnRunner Javadoc to new DoFn

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @tgroh 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam DoFnRunner-javadoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1527


commit d26d9bbce073715d3c6b644b286574ce8c782597
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-06T23:20:28Z

Port most of DoFnRunner Javadoc to new DoFn




> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-06 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated BEAM-1092:
---
Comment: was deleted

(was: Is 1) necessary if we do 2).

I think shading is very necessary (unfortunately) so I like this issue but I 
would also like to keep the example/archetype POMs as simple as possible.)

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727530#comment-15727530
 ] 

Aljoscha Krettek commented on BEAM-1092:


Is 1) necessary if we do 2)?

I think shading is very necessary (unfortunately) so I like this issue but I 
would also like to keep the example/archetype POMs as simple as possible.

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1092) Shade commonly used libraries (e.g. Guava) to avoid class conflicts

2016-12-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727529#comment-15727529
 ] 

Aljoscha Krettek commented on BEAM-1092:


Is 1) necessary if we do 2).

I think shading is very necessary (unfortunately) so I like this issue but I 
would also like to keep the example/archetype POMs as simple as possible.

> Shade commonly used libraries (e.g. Guava) to avoid class conflicts
> ---
>
> Key: BEAM-1092
> URL: https://issues.apache.org/jira/browse/BEAM-1092
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, sdk-java-extensions
>Affects Versions: 0.3.0-incubating
>Reporter: Maximilian Michels
>Assignee: Frances Perry
> Fix For: 0.4.0-incubating
>
>
> Beam shades away some of its dependencies like Guava to avoid user classes 
> from clashing with these dependencies. Some of the artifacts, e.g. KafkaIO, 
> do not shade any classes and directly depend on potentially conflicting 
> libraries (e.g. Guava). Also, users might manually add such libraries as 
> dependencies.
> Runners who add classes to the classpath (e.g. Hadoop) can run into conflict 
> with multiple versions of the same class. To prevent that, we should adjust 
> the Maven archetypes pom files used for the Quickstart to perform shading of 
> commonly used libraries (again, Guava is often the culprit).
> To prevent the problem in the first place, we should expand the shading of 
> Guava and other libraries to all modules which make use of these. 
> To solve both dimensions of the issue, we need to address:
> 1. Adding shading of commonly used libraries to the archetypes poms
> 2. Properly shade all commonly used libraries in the SDK modules
> 2) seems to be of highest priority since it affects users who simply use the 
> provided IO modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1101) Remove inconsistencies in Python PipelineOptions

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727416#comment-15727416
 ] 

ASF GitHub Bot commented on BEAM-1101:
--

GitHub user pabloem opened a pull request:

https://github.com/apache/incubator-beam/pull/1526

[BEAM-1101] Remove inconsistencies in Python PipelineOptions

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam 
poptions-inconsistencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1526


commit c6362aed50eae9cf8b9e5ec6d0885132f3278e32
Author: Pablo <pabl...@google.com>
Date:   2016-12-07T02:01:54Z

Fixing inconsistencies in PipelineOptions




> Remove inconsistencies in Python PipelineOptions
> 
>
> Key: BEAM-1101
> URL: https://issues.apache.org/jira/browse/BEAM-1101
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Frances Perry
>
> Some options have been removed from Java, and some have different default 
> behavior in Java. Gotta make this consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1101) Remove inconsistencies in Python PipelineOptions

2016-12-06 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-1101:
---

 Summary: Remove inconsistencies in Python PipelineOptions
 Key: BEAM-1101
 URL: https://issues.apache.org/jira/browse/BEAM-1101
 Project: Beam
  Issue Type: Bug
  Components: sdk-py
Reporter: Pablo Estrada
Assignee: Frances Perry


Some options have been removed from Java, and some have different default 
behavior in Java. Gotta make this consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1085) Crunch Runner for Beam

2016-12-06 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated BEAM-1085:
--
Component/s: runner-ideas

> Crunch Runner for Beam
> --
>
> Key: BEAM-1085
> URL: https://issues.apache.org/jira/browse/BEAM-1085
> Project: Beam
>  Issue Type: Wish
>  Components: runner-ideas
>Reporter: Neelesh Srinivas Salian
>
> It came up during the BoF Beam talk earlier last month; opening this JIRA as 
> a placeholder for if there is interest/ desire to add this feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO

2016-12-06 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727280#comment-15727280
 ] 

Davor Bonaci commented on BEAM-1099:


Thanks [~jghoman], much appreciated.

> Minor typos in KafkaIO
> --
>
> Key: BEAM-1099
> URL: https://issues.apache.org/jira/browse/BEAM-1099
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jakob Homan
>Assignee: Davor Bonaci
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> While familiarizing myself with the KafkaIO support, I found and fixed a few 
> typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727277#comment-15727277
 ] 

ASF GitHub Bot commented on BEAM-1099:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1524


> Minor typos in KafkaIO
> --
>
> Key: BEAM-1099
> URL: https://issues.apache.org/jira/browse/BEAM-1099
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jakob Homan
>Assignee: Davor Bonaci
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> While familiarizing myself with the KafkaIO support, I found and fixed a few 
> typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-1099) Minor typos in KafkaIO

2016-12-06 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci resolved BEAM-1099.

   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Minor typos in KafkaIO
> --
>
> Key: BEAM-1099
> URL: https://issues.apache.org/jira/browse/BEAM-1099
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jakob Homan
>Assignee: Davor Bonaci
>Priority: Trivial
> Fix For: 0.4.0-incubating
>
>
> While familiarizing myself with the KafkaIO support, I found and fixed a few 
> typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO

2016-12-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727272#comment-15727272
 ] 

Jakob Homan commented on BEAM-1099:
---

https://github.com/apache/incubator-beam/pull/1524

> Minor typos in KafkaIO
> --
>
> Key: BEAM-1099
> URL: https://issues.apache.org/jira/browse/BEAM-1099
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jakob Homan
>Assignee: Davor Bonaci
>Priority: Trivial
>
> While familiarizing myself with the KafkaIO support, I found and fixed a few 
> typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1099) Minor typos in KafkaIO

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727271#comment-15727271
 ] 

ASF GitHub Bot commented on BEAM-1099:
--

GitHub user jghoman opened a pull request:

https://github.com/apache/incubator-beam/pull/1524

[BEAM-1099] Minor typos in KafkaIO

Fix various distracting typos in KafkaIO


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jghoman/incubator-beam BEAM-1099

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1524.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1524


commit a71e9e4b99c14fbe453fa3dca486fa268c9782f5
Author: Jakob Homan <jgho...@gmail.com>
Date:   2016-12-07T00:59:50Z

[BEAM-1099] Minor typos in KafkaIO




> Minor typos in KafkaIO
> --
>
> Key: BEAM-1099
> URL: https://issues.apache.org/jira/browse/BEAM-1099
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jakob Homan
>Assignee: Davor Bonaci
>Priority: Trivial
>
> While familiarizing myself with the KafkaIO support, I found and fixed a few 
> typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727268#comment-15727268
 ] 

ASF GitHub Bot commented on BEAM-551:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1504


> Support Dynamic PipelineOptions
> ---
>
> Key: BEAM-551
> URL: https://issues.apache.org/jira/browse/BEAM-551
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Sam McVeety
>Assignee: Sam McVeety
>Priority: Minor
>
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (BEAM-1098) Minor typos in KafkaIO

2016-12-06 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci closed BEAM-1098.
--
   Resolution: Duplicate
Fix Version/s: Not applicable

> Minor typos in KafkaIO
> --
>
> Key: BEAM-1098
> URL: https://issues.apache.org/jira/browse/BEAM-1098
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jakob Homan
>Assignee: Davor Bonaci
>Priority: Trivial
> Fix For: Not applicable
>
>
> While familiarizing myself with the KafkaIO support, I found and fixed a few 
> typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1099) Minor typos in KafkaIO

2016-12-06 Thread Jakob Homan (JIRA)
Jakob Homan created BEAM-1099:
-

 Summary: Minor typos in KafkaIO
 Key: BEAM-1099
 URL: https://issues.apache.org/jira/browse/BEAM-1099
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Jakob Homan
Assignee: Davor Bonaci
Priority: Trivial


While familiarizing myself with the KafkaIO support, I found and fixed a few 
typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1098) Minor typos in KafkaIO

2016-12-06 Thread Jakob Homan (JIRA)
Jakob Homan created BEAM-1098:
-

 Summary: Minor typos in KafkaIO
 Key: BEAM-1098
 URL: https://issues.apache.org/jira/browse/BEAM-1098
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Jakob Homan
Assignee: Davor Bonaci
Priority: Trivial


While familiarizing myself with the KafkaIO support, I found and fixed a few 
typos in the comments for that class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-9) Storm Runner

2016-12-06 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727260#comment-15727260
 ] 

Davor Bonaci commented on BEAM-9:
-

Having a Storm runner would be great! Awesome!

At some point (in the future), I think it is worth clarifying the pros and cons 
of where the runner lives. On the technical side, I think it comes down to the 
API stability between different pairs of components (but, of course, there are 
other considerations as well).

> Storm Runner
> 
>
> Key: BEAM-9
> URL: https://issues.apache.org/jira/browse/BEAM-9
> Project: Beam
>  Issue Type: Wish
>  Components: runner-ideas
>Reporter: Frances Perry
>Assignee: Sriharsha Chintalapani
>
> Gathering place for interest in a Storm runner for Beam.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727142#comment-15727142
 ] 

ASF GitHub Bot commented on BEAM-551:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1503


> Support Dynamic PipelineOptions
> ---
>
> Key: BEAM-551
> URL: https://issues.apache.org/jira/browse/BEAM-551
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Sam McVeety
>Assignee: Sam McVeety
>Priority: Minor
>
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1038) Support for new State API in DataflowRunner

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727019#comment-15727019
 ] 

ASF GitHub Bot commented on BEAM-1038:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1523


> Support for new State API in DataflowRunner
> ---
>
> Key: BEAM-1038
> URL: https://issues.apache.org/jira/browse/BEAM-1038
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1038) Support for new State API in DataflowRunner

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726929#comment-15726929
 ] 

ASF GitHub Bot commented on BEAM-1038:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1523

[BEAM-1038] Allow stateful DoFn in DataflowRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Confirmed that the post commit should succeed, via the following command:

```
mvn verify \
--batch-mode \
--errors \
--projects runners/google-cloud-dataflow-java \
-DforkCount=0 \
-DfailIfNoTests=false \
-Dtest=org.apache.beam.sdk.transforms.ParDoTest \
-DrunnableOnServicePipelineOptions='[

"--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner",
"--project=...",
"--tempRoot=..." ]'
```

with output ending in:

```
Running org.apache.beam.sdk.transforms.ParDoTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 273.736 
sec - in org.apache.beam.sdk.transforms.ParDoTest

```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
DataflowRunner-state

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1523


commit 9e4d40731ff53b2cd0f2a57d18c6f560db441483
Author: Kenneth Knowles <k...@google.com>
Date:   2016-12-06T21:51:19Z

Allow stateful DoFn in DataflowRunner




> Support for new State API in DataflowRunner
> ---
>
> Key: BEAM-1038
> URL: https://issues.apache.org/jira/browse/BEAM-1038
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1097) Dataflow error message for non-existing gcpTempLocation is misleading

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726917#comment-15726917
 ] 

ASF GitHub Bot commented on BEAM-1097:
--

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/1522

[BEAM-1097] Provide a better error message for non-existing gcpTempLocation

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

gcpTempLocation will default to using the value for tmpLocation, as long
as the value is a valid GCP path. Non-valid GCP paths are silently
discarded.

This change removes existence validation from the default value logic
such that downstream validation can provide a better error message.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam gcp-temp-location-error

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1522.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1522


commit 9d768df4323a246baa705fd5fb75d08c78abb7f0
Author: Scott Wegner <sweg...@google.com>
Date:   2016-12-06T22:19:12Z

Provide a better error message for non-existing gcpTempLocation

gcpTempLocation will default to using the value for tmpLocation, as long
as the value is a valid GCP path. Non-valid GCP paths are silently
discarded.

This change removes existence validation from the default value logic
such that downstream validation can provide a better error message.




> Dataflow error message for non-existing gcpTempLocation is misleading
> -
>
> Key: BEAM-1097
> URL: https://issues.apache.org/jira/browse/BEAM-1097
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>
> The error message for specifying a GCP tempLocation which doesn't exist is 
> misleading. Rather than mentioning the given path doesn't exist, it says none 
> ways specified.
> This is particularly frustrating because it's one of the few configuration 
> values necessary to get started with an example or starter archetype, and 
> it's easy to introduce a typo as it's specified on the commandline. In my 
> case, I was specifying "gs://swegner-tmp" instead of "gs://swegner-test".
> Repro:
> 1. Clone the starter archetype: {noformat}mvn archetype:generate 
> -DarchetypeGroupId=org.apache.beam 
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter{noformat}
> 2. Add beam-runners-google-cloud-dataflow-java as a dependency in the 
> generated pom.xml
> 3. Build: {noformat}mvn install{noformat}
> 4. Run: {noformat}mvn exec:java -DmainClass=swegner.StarterPipeline 
> -Dexec.args="--runner=DataflowRunner 
> --tempLocation=gs://swegner-tmp"{noformat}
> Expected: An error message along the lines of: "The specified GCP temp 
> location 'gs://swegner-tmp' does not exist under project 'myGcpProject'"
> bq. [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project 
> counter-names-test: An exception occured while executing the Java class. 
> null: InvocationTargetException: Failed to construct instance from factory 
> method DataflowRunner#fromOptions(interface 
> org.apache.beam.sdk.options.PipelineOptions): DataflowRunner requires 
> gcpTempLocation, and it is missing in PipelineOptions. -> [Help 1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1097) Dataflow error message for non-existing gcpTempLocation is misleading

2016-12-06 Thread Scott Wegner (JIRA)
Scott Wegner created BEAM-1097:
--

 Summary: Dataflow error message for non-existing gcpTempLocation 
is misleading
 Key: BEAM-1097
 URL: https://issues.apache.org/jira/browse/BEAM-1097
 Project: Beam
  Issue Type: Bug
  Components: examples-java, runner-dataflow
Reporter: Scott Wegner
Assignee: Scott Wegner
Priority: Minor


The error message for specifying a GCP tempLocation which doesn't exist is 
misleading. Rather than mentioning the given path doesn't exist, it says none 
ways specified.

This is particularly frustrating because it's one of the few configuration 
values necessary to get started with an example or starter archetype, and it's 
easy to introduce a typo as it's specified on the commandline. In my case, I 
was specifying "gs://swegner-tmp" instead of "gs://swegner-test".

Repro:

1. Clone the starter archetype: {noformat}mvn archetype:generate 
-DarchetypeGroupId=org.apache.beam 
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter{noformat}
2. Add beam-runners-google-cloud-dataflow-java as a dependency in the generated 
pom.xml
3. Build: {noformat}mvn install{noformat}
4. Run: {noformat}mvn exec:java -DmainClass=swegner.StarterPipeline 
-Dexec.args="--runner=DataflowRunner --tempLocation=gs://swegner-tmp"{noformat}

Expected: An error message along the lines of: "The specified GCP temp location 
'gs://swegner-tmp' does not exist under project 'myGcpProject'"

bq. [ERROR] Failed to execute goal 
org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project 
counter-names-test: An exception occured while executing the Java class. null: 
InvocationTargetException: Failed to construct instance from factory method 
DataflowRunner#fromOptions(interface 
org.apache.beam.sdk.options.PipelineOptions): DataflowRunner requires 
gcpTempLocation, and it is missing in PipelineOptions. -> [Help 1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-986) ReduceFnRunner doesn't batch prefetching pane firings

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726841#comment-15726841
 ] 

ASF GitHub Bot commented on BEAM-986:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1519


> ReduceFnRunner doesn't batch prefetching pane firings
> -
>
> Key: BEAM-986
> URL: https://issues.apache.org/jira/browse/BEAM-986
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Specifically
> - in ProcessElements, if there are multiple windows to consider each is 
> processed sequentially with sequential state fetches instead of a bulk 
> prefetch
> - onTimer method doesn't evaluate multiple timers at a time meaning that if 
> multiple timers are fired at once each is processed sequentially without 
> batched prefetching



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726722#comment-15726722
 ] 

ASF GitHub Bot commented on BEAM-551:
-

Github user sammcveety closed the pull request at:

https://github.com/apache/incubator-beam/pull/1238


> Support Dynamic PipelineOptions
> ---
>
> Key: BEAM-551
> URL: https://issues.apache.org/jira/browse/BEAM-551
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Sam McVeety
>Assignee: Sam McVeety
>Priority: Minor
>
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-551) Support Dynamic PipelineOptions

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726721#comment-15726721
 ] 

ASF GitHub Bot commented on BEAM-551:
-

GitHub user sammcveety reopened a pull request:

https://github.com/apache/incubator-beam/pull/1238

[BEAM-551] BigqueryIO.Read support for ValueProvider 

R: @dhalperi 

This is the serialization issue I was referencing, where I think the issue 
is that there is a deferred translation from String -> TableRef -> String, 
causing a serialization error.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sammcveety/incubator-beam sgmc/bq_vp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1238


commit ced1073597e3fbcc826f10a417e810803c7317ce
Author: Sam McVeety <s...@google.com>
Date:   2016-10-30T18:58:44Z

Add BQ VP code

BQ VP

Fix most tests

Make serializable

Fix BQ tests

Fix tests

Update API

Fix validation case

Fix one more query reference

commit 2d05f107499ce94bb35f05a4c386625e66fe660c
Author: Sam McVeety <s...@google.com>
Date:   2016-12-06T05:24:20Z

Minor fixes

commit 1ca49af0659fd32ffa06e4e5778e3e4f90d85be6
Author: Sam McVeety <s...@google.com>
Date:   2016-12-06T05:31:34Z

Address serialization issue




> Support Dynamic PipelineOptions
> ---
>
> Key: BEAM-551
> URL: https://issues.apache.org/jira/browse/BEAM-551
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Sam McVeety
>Assignee: Sam McVeety
>Priority: Minor
>
> During the graph construction phase, the given SDK generates an initial
> execution graph for the program.  At execution time, this graph is
> executed, either locally or by a service.  Currently, Beam only supports
> parameterization at graph construction time.  Both Flink and Spark supply
> functionality that allows a pre-compiled job to be run without SDK
> interaction with updated runtime parameters.
> In its current incarnation, Dataflow can read values of PipelineOptions at
> job submission time, but this requires the presence of an SDK to properly
> encode these values into the job.  We would like to build a common layer
> into the Beam model so that these dynamic options can be properly provided
> to jobs.
> Please see
> https://docs.google.com/document/d/1I-iIgWDYasb7ZmXbGBHdok_IK1r1YAJ90JG5Fz0_28o/edit
> for the high-level model, and
> https://docs.google.com/document/d/17I7HeNQmiIfOJi0aI70tgGMMkOSgGi8ZUH-MOnFatZ8/edit
> for
> the specific API proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-641) Need to test the generated archetypes projects

2016-12-06 Thread Davor Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davor Bonaci resolved BEAM-641.
---
   Resolution: Invalid
Fix Version/s: (was: 0.4.0-incubating)
   Not applicable

> Need to test the generated archetypes projects
> --
>
> Key: BEAM-641
> URL: https://issues.apache.org/jira/browse/BEAM-641
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Pei He
>Assignee: Pei He
> Fix For: Not applicable
>
>
> Travis and Jenkins pre-submits don't test building the generated archetypes 
> projects.
> Currently, changes to archetypes have to be manually verified by:
> mvn archetype:generate \
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
> -DarchetypeGroupId=org.apache.beam \
> -DarchetypeVersion=0.3.0-incubating-SNAPSHOT \
> -DgroupId=com.example \
> -DartifactId=first-beam \
> -Dversion="0.3.0-incubating-SNAPSHOT" \
> -DinteractiveMode=false \
> -Dpackage=org.apache.beam.examples
> and did "mvn clean install" in first-beam project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-507) Fill in the documentation/runners/spark portion of the website

2016-12-06 Thread Amit Sela (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Sela reassigned BEAM-507:
--

Assignee: Amit Sela  (was: James Malone)

> Fill in the documentation/runners/spark portion of the website
> --
>
> Key: BEAM-507
> URL: https://issues.apache.org/jira/browse/BEAM-507
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Frances Perry
>Assignee: Amit Sela
>
> As per 
> https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit.
> Should be a landing page for Spark-specific information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1096) flink streaming side output optimization using SplitStream

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726461#comment-15726461
 ] 

ASF GitHub Bot commented on BEAM-1096:
--

GitHub user xhumanoid opened a pull request:

https://github.com/apache/incubator-beam/pull/1520

[BEAM-1096] flink streaming side output optimization using SplitStream

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

@aljoscha check please

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xhumanoid/incubator-beam 
stream_side_output_optimization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1520


commit 568d73f74219a42cac4028b198f53f22a832990f
Author: Alexey Diomin <diomi...@gmail.com>
Date:   2016-12-06T19:30:54Z

[BEAM-1096] flink streaming side output optimization using SplitStream




> flink streaming side output optimization using SplitStream
> --
>
> Key: BEAM-1096
> URL: https://issues.apache.org/jira/browse/BEAM-1096
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Priority: Minor
>
> Current implementation:
> 1) send all events in all output streams
> 2) filtering streams for necessary tags
> Cons: increased cpu usage for serialization all events
> Proposed implementation:
> 1) route event in correct streams based on tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-986) ReduceFnRunner doesn't batch prefetching pane firings

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726441#comment-15726441
 ] 

ASF GitHub Bot commented on BEAM-986:
-

GitHub user scwhittle opened a pull request:

https://github.com/apache/incubator-beam/pull/1519

[BEAM-986] Improvements to ReduceFnRunner prefetching

- add prefetch* methods for prefetching state matching existing methods
- replace onTimer with batched onTimers method to allow prefetching across 
timers
- remove deprecated TimerCallback usage
- prefetch triggers in processElements


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwhittle/incubator-beam 
reduce_fn_prefetching_no_sdk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1519


commit 5085c77736a314519c6dcdb4c4c2c21b425b9668
Author: Sam Whittle <samu...@google.com>
Date:   2016-11-10T20:59:49Z

Improvements to ReduceFnRunner prefetching:
- add prefetch* methods for prefetching state matching existing methods
- replace onTimer with batched onTimers method to allow prefetching
  across timers
- prefetch triggers in processElements
Additionally remove deprecated TimerCallback usage




> ReduceFnRunner doesn't batch prefetching pane firings
> -
>
> Key: BEAM-986
> URL: https://issues.apache.org/jira/browse/BEAM-986
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Specifically
> - in ProcessElements, if there are multiple windows to consider each is 
> processed sequentially with sequential state fetches instead of a bulk 
> prefetch
> - onTimer method doesn't evaluate multiple timers at a time meaning that if 
> multiple timers are fired at once each is processed sequentially without 
> batched prefetching



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1096) flink streaming side output optimization using SplitStream

2016-12-06 Thread Alexey Diomin (JIRA)
Alexey Diomin created BEAM-1096:
---

 Summary: flink streaming side output optimization using SplitStream
 Key: BEAM-1096
 URL: https://issues.apache.org/jira/browse/BEAM-1096
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Affects Versions: 0.4.0-incubating
Reporter: Alexey Diomin
Priority: Minor


Current implementation:
1) send all events in all output streams
2) filtering streams for necessary tags

Cons: increased cpu usage for serialization all events

Proposed implementation:
1) route event in correct streams based on tag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726352#comment-15726352
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1495


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1095) Add support set config for reuse-object on flink

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726285#comment-15726285
 ] 

ASF GitHub Bot commented on BEAM-1095:
--

GitHub user xhumanoid opened a pull request:

https://github.com/apache/incubator-beam/pull/1518

[BEAM-1095] Add support set config for reuse-object on flink

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

@aljoscha check please

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xhumanoid/incubator-beam flink_object_reuse

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1518


commit 8ae24419cd9125aa445b36d736fc4d8bfe2dcb7d
Author: Alexey Diomin <diomi...@gmail.com>
Date:   2016-12-06T18:20:02Z

[BEAM-1095] Add support set config for reuse-object on flink




> Add support set config for reuse-object on flink
> 
>
> Key: BEAM-1095
> URL: https://issues.apache.org/jira/browse/BEAM-1095
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Affects Versions: 0.4.0-incubating
>Reporter: Alexey Diomin
>Priority: Trivial
>
> Object-reuse is dangerous setting and disabled by default,
> but sometime we need use this option to omit performance overhead for 
> serialization-deserialization objects on every transformations 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1087) Pickling error in save main session

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726281#comment-15726281
 ] 

ASF GitHub Bot commented on BEAM-1087:
--

Github user sb2nov closed the pull request at:

https://github.com/apache/incubator-beam/pull/1485


> Pickling error in save main session
> ---
>
> Key: BEAM-1087
> URL: https://issues.apache.org/jira/browse/BEAM-1087
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>Priority: Minor
>
> {code}
>   File "/usr/local/lib/python2.7/pickle.py", line 286, in save
> f(self, obj) # Call unbound method with explicit self
>   File "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 1231, in 
> save_type
> StockPickler.save_global(pickler, obj)
>   File "/usr/local/lib/python2.7/pickle.py", line 754, in save_global
> (obj, module, name))
> pickle.PicklingError: Can't pickle  'apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum'>:
>  it's not found as 
> apache_beam.internal.clients.dataflow.dataflow_v1b3_messages.TypeValueValuesEnum
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-293) StreamingOptions should not extend GcpOptions

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726230#comment-15726230
 ] 

ASF GitHub Bot commented on BEAM-293:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1515


> StreamingOptions should not extend GcpOptions
> -
>
> Key: BEAM-293
> URL: https://issues.apache.org/jira/browse/BEAM-293
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>  Labels: backward-incompatible
>
> Now, the SDK {{StreamingOptions}} extends {{GcpOptions}}:
> {code}
> StreamingOptions extends ApplicationNameOptions, *GcpOptions*, PipelineOptions
> {code}
> The core SDK should not depend to GCP generally speaking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1093) Confusing Javadocs in StateInternals

2016-12-06 Thread Davor Bonaci (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726206#comment-15726206
 ] 

Davor Bonaci commented on BEAM-1093:


Just an outdated comment. [~mauzhang], perhaps you can quickly fix it by 
swapping Dataflow and Beam?

> Confusing Javadocs in StateInternals
> 
>
> Key: BEAM-1093
> URL: https://issues.apache.org/jira/browse/BEAM-1093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Manu Zhang
>Assignee: Ben Chambers
>Priority: Minor
>
> At last but one line of  StateInternals' Javadocs, it says "This is a 
> low-level API intended for use by the Dataflow SDK". Not sure what is 
> "Dataflow SDK".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-651) Consider making TypedPValue.setTypeDescriptorInternal no longer Internal

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725987#comment-15725987
 ] 

ASF GitHub Bot commented on BEAM-651:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1516


> Consider making TypedPValue.setTypeDescriptorInternal no longer Internal
> 
>
> Key: BEAM-651
> URL: https://issues.apache.org/jira/browse/BEAM-651
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: easy, easyfix, starter
> Fix For: 0.4.0-incubating
>
>
> This would give fairly pithy answers to StackOverflow questions sometimes.
> When choosing between .getOutputCoder() and .getOutputTypeDescriptor() for a 
> transform/DoFn we often choose the type, so the coder registry can do its 
> thing.
> This would also give a similar choice between .setCoder(...) and 
> .setTypeDescriptor(...).
> And anyhow we have the intention of removing our practice of the "*Internal" 
> suffix, so this one might be most easily solved by making it public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-651) Consider making TypedPValue.setTypeDescriptorInternal no longer Internal

2016-12-06 Thread Kenneth Knowles (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-651.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

> Consider making TypedPValue.setTypeDescriptorInternal no longer Internal
> 
>
> Key: BEAM-651
> URL: https://issues.apache.org/jira/browse/BEAM-651
> Project: Beam
>  Issue Type: Wish
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: easy, easyfix, starter
> Fix For: 0.4.0-incubating
>
>
> This would give fairly pithy answers to StackOverflow questions sometimes.
> When choosing between .getOutputCoder() and .getOutputTypeDescriptor() for a 
> transform/DoFn we often choose the type, so the coder registry can do its 
> thing.
> This would also give a similar choice between .setCoder(...) and 
> .setTypeDescriptor(...).
> And anyhow we have the intention of removing our practice of the "*Internal" 
> suffix, so this one might be most easily solved by making it public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-830) Launcher for ApexRunner execution on YARN cluster

2016-12-06 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated BEAM-830:
--
Assignee: Thomas Weise

> Launcher for ApexRunner execution on YARN cluster 
> --
>
> Key: BEAM-830
> URL: https://issues.apache.org/jira/browse/BEAM-830
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>
> Currently the ApexRunner only support execution in embedded mode. Add the 
> support to package the dependencies and run the Apex app on a YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-830) Launcher for ApexRunner execution on YARN cluster

2016-12-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15725690#comment-15725690
 ] 

ASF GitHub Bot commented on BEAM-830:
-

GitHub user tweise opened a pull request:

https://github.com/apache/incubator-beam/pull/1517

[BEAM-830] ApexRunner launch on YARN cluster.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
R: @kennknowles @dhalperi 

This PR provides the support to run a Beam pipeline through the main method 
on the YARN cluster. Example:

```
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount 
-Dexec.args="--inputFile=pom.xml --output=counts --runner=ApexRunner 
--embeddedExecution=false" -Papex-runner
```

To make that happen it has to do some magic with the class path to 
determine the dependencies that should be used with the Hadoop client. 

Would like to get these changes into the upcoming release.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tweise/incubator-beam BEAM-830

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1517


commit 1b1336fd903d13a5874f1e6b1d03888f54f0fbff
Author: Thomas Weise <t...@apache.org>
Date:   2016-11-25T02:36:11Z

BEAM-830 Support launch on YARN cluster.




> Launcher for ApexRunner execution on YARN cluster 
> --
>
> Key: BEAM-830
> URL: https://issues.apache.org/jira/browse/BEAM-830
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-apex
>Reporter: Thomas Weise
>
> Currently the ApexRunner only support execution in embedded mode. Add the 
> support to package the dependencies and run the Apex app on a YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1093) Confusing Javadocs in StateInternals

2016-12-06 Thread Manu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated BEAM-1093:
-
Assignee: Ben Chambers  (was: Davor Bonaci)

> Confusing Javadocs in StateInternals
> 
>
> Key: BEAM-1093
> URL: https://issues.apache.org/jira/browse/BEAM-1093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Manu Zhang
>Assignee: Ben Chambers
>Priority: Minor
>
> At last but one line of  StateInternals' Javadocs, it says "This is a 
> low-level API intended for use by the Dataflow SDK". Not sure what is 
> "Dataflow SDK".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1093) Confusing Javadocs in StateInternals

2016-12-06 Thread Manu Zhang (JIRA)
Manu Zhang created BEAM-1093:


 Summary: Confusing Javadocs in StateInternals
 Key: BEAM-1093
 URL: https://issues.apache.org/jira/browse/BEAM-1093
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Manu Zhang
Assignee: Davor Bonaci
Priority: Minor


At last but one line of  StateInternals' Javadocs, it says "This is a low-level 
API intended for use by the Dataflow SDK". Not sure what is "Dataflow SDK".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    2   3   4   5   6   7   8   9   10   11   >