[jira] [Commented] (BEAM-469) NullableCoder optimized encoding via passthrough context

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768525#comment-15768525
 ] 

ASF GitHub Bot commented on BEAM-469:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1680

[BEAM-XXX] Make KVCoder more efficient by removing unnecessary nesting

See [BEAM-469] for more information about why this is
correct.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam 
efficient-nested-coders

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1680


commit 621e8250c9535d773c4f4440a34ea0833912b51f
Author: Dan Halperin 
Date:   2016-12-21T23:37:49Z

[BEAM-XXX] Make KVCoder more efficient by removing unnecessary nesting

See [BEAM-469] for more information about why this is
correct.




> NullableCoder optimized encoding via passthrough context
> 
>
> Key: BEAM-469
> URL: https://issues.apache.org/jira/browse/BEAM-469
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Thomas Groh
>Priority: Trivial
>  Labels: backward-incompatible
> Fix For: 0.3.0-incubating
>
>
> NullableCoder should encode using the context given and not always use the 
> nested context. For coders which can efficiently encode in the outer context 
> such as StringUtf8Coder or ByteArrayCoder, we are forcing them to prefix 
> themselves with their length.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1201) Remove producesSortedKeys from Source

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768483#comment-15768483
 ] 

ASF GitHub Bot commented on BEAM-1201:
--

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1679

[BEAM-1201] Remove BoundedSource.producesSortedKeys

R: @jkff

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam 
remove-produces-sorted-keys

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1679


commit ee15138543f8b9926466cf4e4dc6857b3173345e
Author: Dan Halperin 
Date:   2016-12-21T23:32:38Z

[BEAM-1201] Remove BoundedSource.producesSortedKeys

Unused and unclear; for more information see the linked JIRA.




> Remove producesSortedKeys from Source
> -
>
> Key: BEAM-1201
> URL: https://issues.apache.org/jira/browse/BEAM-1201
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>  Labels: backward-incompatible
>
> This is a holdover from a precursor of the old Dataflow SDK that we just 
> failed to delete before releasing Dataflow 1.0, but we can delete before the 
> first stable release of Beam.
> This function has never been used by any runner. It does not mean anything 
> obvious to implementors, as many sources produce {{T}}, not {{KV}} -- 
> what does it mean in the former case? (And how do you get the latter case 
> correct?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768470#comment-15768470
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1582


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768461#comment-15768461
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

Github user markflyhigh closed the pull request at:

https://github.com/apache/incubator-beam/pull/1639


> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768462#comment-15768462
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

GitHub user markflyhigh reopened a pull request:

https://github.com/apache/incubator-beam/pull/1639

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

 - E2e test framework that supports TestRunner start and verify pipeline 
job.
   - add `TestOptions` which defined `on_success_matcher` that is used to 
verify state/output of pipeline job.
   - validate `on_success_matcher` before pipeline execution to make sure 
it's unpicklable to a subclass of BaseMatcher.
   - create a `TestDataflowRunner` which provide functionalities of 
`DataflowRunner` plus result verification.
   - provide a test verifier `PipelineStateMatcher` that can verify 
pipeline job finished in DONE or not.
 - Add wordcount_it (it = integration test) that build e2e test based on 
existing wordcount pipeline.
   - include wordcount_it to nose collector, so that wordcount_it can be 
collected and run by nose.
   - skip ITs when running unit tests from tox in precommit and postcommit.

Current changes will not change behavior of existing pre/postcommit.
Test is done by running `tox -e py27 -c sdks/python/tox.ini` for unit test 
and running wordcount_it with `TestDataflowRunner` on service 
([link](https://pantheon.corp.google.com/dataflow/job/2016-12-15_17_36_16-3857167705491723621?project=google.com:clouddfe)).

TODO:
 - Output data verifier that verify pipeline output that stores in 
filesystem.
 - Add wordcount_it to precommit and replace existing wordcount execution 
command in postcommit with a better structured nose command.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markflyhigh/incubator-beam e2e-testrunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1639






> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1198) ViewFn: explicitly decouple runner materialization of side inputs from SDK-specific mapping

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768387#comment-15768387
 ] 

ASF GitHub Bot commented on BEAM-1198:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1678

[BEAM-1198, BEAM-846, BEAM-260] Refactor Dataflow translator to decouple 
input and output graphs more easily

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This is preparatory work to make it possible for the translator to have a 
more loosely coupled relationship between its input and output.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam Dataflow-Translator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1678


commit 8ed4bb68660c537e4a12c1077ecfa104f9a82eaa
Author: Kenneth Knowles 
Date:   2016-12-21T22:21:50Z

Inline needless interface DataflowTranslator.TranslationContext

The only implementation was DataflowTranslator.Translator. This class
needs some updating and the extra layer of the interface simply
obscures that work.

commit 272d06d7507ad7162616dd1b613efa7c8f5f4069
Author: Kenneth Knowles 
Date:   2016-12-21T22:34:27Z

Explicitly pass Step to mutate in Dataflow translator

Previously, there was always a "current" step that was the most recent
step created. This makes it cumbersome or impossible to do things like
translate one primitive transform into a small subgraph of steps. Thus
we added hacks like CreatePCollectionView which are not actually part
of the model at all - in fact, we should be able to add the needed
CollectionToSingleton steps simply by looking at the side inputs of a
ParDo node.




> ViewFn: explicitly decouple runner materialization of side inputs from 
> SDK-specific mapping
> ---
>
> Key: BEAM-1198
> URL: https://issues.apache.org/jira/browse/BEAM-1198
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model-fn-api, beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> For side inputs, the field {{PCollectionView.fromIterableInternal}} implies 
> an "iterable" materialization of the contents of a PCollection, which is 
> adapted to the desired user-facing type within a UDF (today the 
> PCollectionView "is" the UDF)
> In practice, runners get adequate performance by special casing just a couple 
> of types of PCollectionView in a rather cumbersome manner.
> The design to improve this is https://s.apache.org/beam-side-inputs-1-pager 
> and this makes the de facto standard approach the actual model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1117) Support for new Timer API in Direct runner

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768294#comment-15768294
 ] 

ASF GitHub Bot commented on BEAM-1117:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1669


> Support for new Timer API in Direct runner
> --
>
> Key: BEAM-1117
> URL: https://issues.apache.org/jira/browse/BEAM-1117
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1194) DataflowRunner should test a variety of valid tempLocation/stagingLocation/etc formats.

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768284#comment-15768284
 ] 

ASF GitHub Bot commented on BEAM-1194:
--

Github user dhalperi closed the pull request at:

https://github.com/apache/incubator-beam/pull/1671


> DataflowRunner should test a variety of valid 
> tempLocation/stagingLocation/etc formats.
> ---
>
> Key: BEAM-1194
> URL: https://issues.apache.org/jira/browse/BEAM-1194
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
>
> Cloud Dataflow has a minor history of small bugs related to various code 
> paths expecting there to be or not be a trailing forward-slash in these 
> location fields. The way that Beam's integration tests are set up, we are 
> likely to only have one of these two cases tested (there is a single set of 
> integration test pipeline options).
> We should add a dedicated DataflowRunner integration test to handle this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-25) Add user-ready API for interacting with state

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768175#comment-15768175
 ] 

ASF GitHub Bot commented on BEAM-25:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1670


> Add user-ready API for interacting with state
> -
>
> Key: BEAM-25
> URL: https://issues.apache.org/jira/browse/BEAM-25
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: State
>
> Our current state API is targeted at runner implementers, not pipeline 
> authors. As such it has many capabilities that are not necessary nor 
> desirable for simple use cases of stateful ParDo (such as dynamic state tag 
> creation). Implement a simple state intended for user access.
> (Details of our current thoughts in forthcoming design doc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-79) Gearpump runner

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768076#comment-15768076
 ] 

ASF GitHub Bot commented on BEAM-79:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1663


> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-gearpump
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-115) Beam Runner API

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767972#comment-15767972
 ] 

ASF GitHub Bot commented on BEAM-115:
-

GitHub user kennknowles reopened a pull request:

https://github.com/apache/incubator-beam/pull/662

[BEAM-115] WIP: JSON Schema definition of pipeline

This is a json-schema sketch of the concrete schema from the [Pipeline 
Runner API proposal document](https://s.apache.org/beam-runner-api). Because 
our [serialization tech 
discussion](http://mail-archives.apache.org/mod_mbox/beam-dev/201606.mbox/%3CCAN_Ypr2ZPQG3OgPWu==kf-zztg06k0v5i0ay3dabchjyver...@mail.gmail.com%3E)
 seemed to favor JSON on the front end and Proto on the backend, I made this 
quick port. The original Avro IDL definition is also on [a branch with a 
test](https://github.com/kennknowles/incubator-beam/blob/pipeline-model/model/pipeline/src/main/avro/org/apache/beam/model/pipeline/pipeline.avdl).

Notes & Caveats:
- I did not try to flesh out any more details; this was a straight port. 
There's plenty to add, but a PR seems like a place that will attract a desired 
kind of concrete discussion even in the current state.
- Typing this makes my hands hurt. Luckily, it should change exceedingly 
rarely. There are many libraries that can generate json-schema in various ways, 
including Jackson itself, but I'm not so sure any of them are applicable.
- Reading this makes my eyes hurt. This is a real problem. We need a 
readable spec, not just a test suite for validation.
- I am not so sure that [the schema 
library](https://github.com/daveclayton/json-schema-validator) I've used to 
build my smoke test is a good long term choice. I chose it because it was 
Jackson-based.
- I've left comments in the JSON even though that is frowned upon, and 
taken advantage of Jackson's feature to allow them. They can also go into 
`"description"` fields.
- Perhaps we could write YAML and convert to json-schema with no loss of 
precision?

Feel free to leave comments here about the schema or meta issues of e.g. 
where the schema should live and what libraries we might want to use.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
pipeline-json-schema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/662.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #662


commit c5843ce10e782056c76157169eb5516bf18ed9e4
Author: Kenneth Knowles 
Date:   2016-06-10T15:51:02Z

WIP: add JSON Schema definition of pipeline




> Beam Runner API
> ---
>
> Key: BEAM-115
> URL: https://issues.apache.org/jira/browse/BEAM-115
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> The PipelineRunner API from the SDK is not ideal for the Beam technical 
> vision.
> It has technical limitations:
>  - The user's DAG (even including library expansions) is never explicitly 
> represented, so it cannot be analyzed except incrementally, and cannot 
> necessarily be reconstructed (for example, to display it!).
>  - The flattened DAG of just primitive transforms isn't well-suited for 
> display or transform override.
>  - The TransformHierarchy isn't well-suited for optimizations.
>  - The user must realistically pre-commit to a runner, and its configuration 
> (batch vs streaming) prior to graph construction, since the runner will be 
> modifying the graph as it is built.
>  - It is fairly language- and SDK-specific.
> It has usability issues (these are not from intuition, but derived from 
> actual cases of failure to use according to the design)
>  - The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner 
> is confusing.
>  - The TransformHierarchy, accessible only via visitor traversals, is 
> cumbersome.
>  - The staging of construction-time vs run-time is not always obvious.
> These are just examples. This ticket tracks designing, coming to consensus, 
> and building an API that more simply and directly supports the technical 
> vision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767897#comment-15767897
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1660


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767806#comment-15767806
 ] 

ASF GitHub Bot commented on BEAM-362:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1666


> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767732#comment-15767732
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1673


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767452#comment-15767452
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1668


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1146) Decrease spark runner startup overhead

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767378#comment-15767378
 ] 

ASF GitHub Bot commented on BEAM-1146:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/incubator-beam/pull/1674

[BEAM-1146] Decrease spark runner startup overhead

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---
Replace finding all `Source` and `Coder` implementations for serialization 
registration with wrapper classes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/incubator-beam 
decrease-spark-runner-startup-overhead

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1674


commit 8501cdc88ee9c89f643120e34381ec9bc2562965
Author: Aviem Zur 
Date:   2016-12-21T15:49:34Z

[BEAM-1146] Decrease spark runner startup overhead

Replace finding all `Source` and `Coder` implementations for serialization 
registration with wrapper classes.




> Decrease spark runner startup overhead
> --
>
> Key: BEAM-1146
> URL: https://issues.apache.org/jira/browse/BEAM-1146
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Aviem Zur
>
> BEAM-921 introduced a lazy singleton instantiated once in each machine 
> (driver & executors) which utilizes reflection to find all subclasses of 
> Source and Coder
> While this is beneficial in it's own right, the change added about one minute 
> of overhead in spark runner startup time (which cause the first job/stage to 
> take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}
> The reason reflection (specifically reflections library) was used here is 
> because  there is no current way of knowing all the source and coder classes 
> at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766056#comment-15766056
 ] 

ASF GitHub Bot commented on BEAM-27:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1673

[BEAM-27] Require TimeDomain to delete a timer

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @aljoscha 

A bit of an oversight, I neglected the fact that runners generally store 
different sorts of timers in rather different ways. When a user sets a timer, 
the `DoFnSignature` is available, so this will be for free. And when system 
code deletes a timer, the domain will always be known.

This will require a Dataflow update, so don't worry if Dataflow-specific 
integration tests don't pass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam delete-by-domain

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1673


commit 46dfd0fb4d2a1533d3ed053983faee6537d3ccf0
Author: Kenneth Knowles 
Date:   2016-12-21T04:09:25Z

Require TimeDomain to delete a timer




> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-79) Gearpump runner

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765799#comment-15765799
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user manuzhang reopened a pull request:

https://github.com/apache/incubator-beam/pull/1663

[BEAM-79] merge master into gearpump-runner branch

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam gearpump-runner-sync

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1663.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1663


commit e9f254ef2769a082c7fbb500c1c28c6224ac5a7f
Author: Jakob Homan 
Date:   2016-12-07T00:59:50Z

[BEAM-1099] Minor typos in KafkaIO

commit afedd68e806830549724dfc0f2565d756cc6b46d
Author: Davor Bonaci 
Date:   2016-12-07T01:03:54Z

This closes #1524

commit e8c9686a2e898d38afd692328eb171c542084747
Author: Pei He 
Date:   2016-11-23T23:59:56Z

[BEAM-1047] Add DataflowClient wrapper on top of JSON library.

commit ded58832ceaef487f4590d9396f09744288c955d
Author: Pei He 
Date:   2016-11-24T00:14:27Z

[Code Health] Remove redundant projectId from DataflowPipelineJob.

commit ce03f30c1ee0b84ad2e7f10a6272ffb25548244a
Author: Pei He 
Date:   2016-11-28T19:47:42Z

[BEAM-1047] Update dataflow runner code to use DataflowClient wrapper.

commit b2b570f27808b1671bf6cd0fc60f874da671d4ca
Author: bchambers 
Date:   2016-12-07T01:08:13Z

Closes #1434

commit 0a2ed832ce5af7556db605e99b985ed4ffc1b152
Author: Sam McVeety 
Date:   2016-10-30T18:58:44Z

BigQueryIO.Read: support runtime options

commit 869b2710efdb90bc3ce5b6e9d4f3b49a3a804a63
Author: Aljoscha Krettek 
Date:   2016-12-07T05:28:13Z

[FLINK-1102] Fix Aggregator Registration in Flink Batch Runner

commit b41a46e86fd38c4a887f31bdf6cb75969f4750d3
Author: Aljoscha Krettek 
Date:   2016-12-07T07:26:02Z

This closes #1530

commit baf5e6bd9b1011f4c5c3974aa46393471b340c15
Author: Jean-Baptiste Onofré 
Date:   2016-12-07T07:37:33Z

[BEAM-1094] Set test scope for Kafka IO and junit

commit 9ccf6dbea0d3807fef6a7c0432906fffd2b8ec3f
Author: Sela 
Date:   2016-12-07T08:31:38Z

This closes #1531

commit dce3a196a3a26fdd42225520faf3d9084ee48123
Author: Sela 
Date:   2016-12-07T09:20:07Z

[BEAM-329] Update Spark runner README.

commit 02bb8c375c48847b1686d70184fb194500a62e8c
Author: Jean-Baptiste Onofré 
Date:   2016-12-07T11:51:09Z

[BEAM-329] This closes #1532

commit b2d72237b592e1dcb5cca30f5cbc9a11d2890c0f
Author: Kenneth Knowles 
Date:   2016-12-06T23:20:28Z

Port most of DoFnRunner Javadoc to new DoFn

commit 1526184ae8be1f8ae6863f830653204157a584cd
Author: Thomas Groh 
Date:   2016-12-07T16:51:02Z

This closes #1527

commit 8e1e46e73edf9cce376ed7bd194db00edc3e60b4
Author: Kenneth Knowles 
Date:   2016-12-07T05:01:37Z

Port ParDoTest from OldDoFn to new DoFn

commit ae52ec1bc3f3120e9f8e150e50c18564055a467c
Author: Kenneth Knowles 
Date:   2016-12-07T17:00:18Z

This closes #1529

commit 55d333bff68809ff1a9154491ace02d2d16e3b85
Author: Thomas Groh 
Date:   2016-12-05T22:29:05Z

Only provide expanded Inputs and Outputs

This removes PInput and POutput from the immediate API Surface of
TransformHierarchy.Node, and forces Pipeline Visitors to access only
the expanded version of the output.

This is part of the move towards the runner-agnostic representation of a
graph.

commit 5b31a369962907e257de8019fbf6cde4c615b1c0
Author: Thomas Groh 
Date:   2016-12-07T17:14:38Z

This closes #1511

commit 43fef2775145f67def3ab8a246ecca192a7d650b
Author: Dan Halperin 
Date:   2016-12-07T12:06:57Z

[BEAM-905] Add shading config to examples archetype and enable it for 

[jira] [Commented] (BEAM-25) Add user-ready API for interacting with state

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765628#comment-15765628
 ] 

ASF GitHub Bot commented on BEAM-25:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1670

[BEAM-25, BEAM-1117] Fixes for direct runner expansion and evaluation of 
stateful ParDo

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @tgroh also peeled off from the timers PR, these are fixes for the whole 
setup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
DirectRunner-Stateful

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1670


commit 0615fc9749c3fd0012f4d5524ea8486413778636
Author: Kenneth Knowles 
Date:   2016-12-20T21:58:29Z

Fix windowing in direct runner Stateful ParDo

commit 7bc23d6b53ed29ae565121df49180ad8d4aac653
Author: Kenneth Knowles 
Date:   2016-12-20T23:59:45Z

Actually propagate and commit state in direct runner




> Add user-ready API for interacting with state
> -
>
> Key: BEAM-25
> URL: https://issues.apache.org/jira/browse/BEAM-25
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: State
>
> Our current state API is targeted at runner implementers, not pipeline 
> authors. As such it has many capabilities that are not necessary nor 
> desirable for simple use cases of stateful ParDo (such as dynamic state tag 
> creation). Implement a simple state intended for user access.
> (Details of our current thoughts in forthcoming design doc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1117) Support for new Timer API in Direct runner

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765551#comment-15765551
 ] 

ASF GitHub Bot commented on BEAM-1117:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1669

[BEAM-1117] Direct runner timers prereqs

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Per request, here are some commits from #1667 broken out. I am happy to 
trim off more, etc, whatever is easiest for review.

R: @tgroh 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
DirectRunner-timers-prereqs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1669.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1669


commit f64816e0cf2e4fcc9525f40ede01c2f8e4ecf28d
Author: Kenneth Knowles 
Date:   2016-12-20T04:40:11Z

Add informative Instant formatter to BoundedWindow

commit 46c6a4f613629f09b48e3630aa344760b0ad46d4
Author: Kenneth Knowles 
Date:   2016-12-20T04:40:47Z

Use informative Instant formatter in WatermarkHold

commit 92baa418fbe53c0e7c7afc81db31fc02ab7f3915
Author: Kenneth Knowles 
Date:   2016-12-20T21:57:55Z

Add static Window.withOutputTimeFn to match build method

commit 7118c4ff85636a65431be54fa2e2f18fb52914cf
Author: Kenneth Knowles 
Date:   2016-12-20T22:20:07Z

Add UsesTestStream for use with JUnit @Category

commit f667a3e8abcd95be7a235132219c936178ab6bc8
Author: Kenneth Knowles 
Date:   2016-12-08T04:18:44Z

Allow setting timer by ID in DirectTimerInternals

commit 217e5245e59800d57aa36551fbbdb642a5b447a0
Author: Kenneth Knowles 
Date:   2016-12-20T21:37:40Z

Hold output watermark according to pending timers




> Support for new Timer API in Direct runner
> --
>
> Key: BEAM-1117
> URL: https://issues.apache.org/jira/browse/BEAM-1117
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765542#comment-15765542
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1569


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model-runner-api, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765531#comment-15765531
 ] 

ASF GitHub Bot commented on BEAM-362:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1665


> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765387#comment-15765387
 ] 

ASF GitHub Bot commented on BEAM-27:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1668

[BEAM-27, BEAM-362] Remove deprecated InMemoryTimerInternals from SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
InMemoryTimerInternals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1668


commit bbea8469912b23383a9ae5cf084b5801706e
Author: Kenneth Knowles 
Date:   2016-12-20T22:07:00Z

Remove deprecated InMemoryTimerInternals from SDK




> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1097) Dataflow error message for non-existing gcpTempLocation is misleading

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765244#comment-15765244
 ] 

ASF GitHub Bot commented on BEAM-1097:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1522


> Dataflow error message for non-existing gcpTempLocation is misleading
> -
>
> Key: BEAM-1097
> URL: https://issues.apache.org/jira/browse/BEAM-1097
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, runner-dataflow
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>Priority: Minor
>
> The error message for specifying a GCP tempLocation which doesn't exist is 
> misleading. Rather than mentioning the given path doesn't exist, it says none 
> ways specified.
> This is particularly frustrating because it's one of the few configuration 
> values necessary to get started with an example or starter archetype, and 
> it's easy to introduce a typo as it's specified on the commandline. In my 
> case, I was specifying "gs://swegner-tmp" instead of "gs://swegner-test".
> Repro:
> 1. Clone the starter archetype: {noformat}mvn archetype:generate 
> -DarchetypeGroupId=org.apache.beam 
> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-starter{noformat}
> 2. Add beam-runners-google-cloud-dataflow-java as a dependency in the 
> generated pom.xml
> 3. Build: {noformat}mvn install{noformat}
> 4. Run: {noformat}mvn exec:java -DmainClass=swegner.StarterPipeline 
> -Dexec.args="--runner=DataflowRunner 
> --tempLocation=gs://swegner-tmp"{noformat}
> Expected: An error message along the lines of: "The specified GCP temp 
> location 'gs://swegner-tmp' does not exist under project 'myGcpProject'"
> bq. [ERROR] Failed to execute goal 
> org.codehaus.mojo:exec-maven-plugin:1.4.0:java (default-cli) on project 
> counter-names-test: An exception occured while executing the Java class. 
> null: InvocationTargetException: Failed to construct instance from factory 
> method DataflowRunner#fromOptions(interface 
> org.apache.beam.sdk.options.PipelineOptions): DataflowRunner requires 
> gcpTempLocation, and it is missing in PipelineOptions. -> [Help 1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765220#comment-15765220
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

GitHub user markflyhigh reopened a pull request:

https://github.com/apache/incubator-beam/pull/1639

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

 - E2e test framework that supports TestRunner start and verify pipeline 
job.
   - add `TestOptions` which defined `on_success_matcher` that is used to 
verify state/output of pipeline job.
   - validate `on_success_matcher` before pipeline execution to make sure 
it's unpicklable to a subclass of BaseMatcher.
   - create a `TestDataflowRunner` which provide functionalities of 
`DataflowRunner` plus result verification.
   - provide a test verifier `PipelineStateMatcher` that can verify 
pipeline job finished in DONE or not.
 - Add wordcount_it (it = integration test) that build e2e test based on 
existing wordcount pipeline.
   - include wordcount_it to nose collector, so that wordcount_it can be 
collected and run by nose.
   - skip ITs when running unit tests from tox in precommit and postcommit.

Current changes will not change behavior of existing pre/postcommit.
Test is done by running `tox -e py27 -c sdks/python/tox.ini` for unit test 
and running wordcount_it with `TestDataflowRunner` on service 
([link](https://pantheon.corp.google.com/dataflow/job/2016-12-15_17_36_16-3857167705491723621?project=google.com:clouddfe)).

TODO:
 - Output data verifier that verify pipeline output that stores in 
filesystem.
 - Add wordcount_it to precommit and replace existing wordcount execution 
command in postcommit with a better structured nose command.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markflyhigh/incubator-beam e2e-testrunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1639


commit e1e1fa3a60e1fe234829432d144d6689e240b6f0
Author: Mark Liu 
Date:   2016-12-16T01:41:20Z

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

commit 0e7007879ee082e3afe5db36107f51c03274f3f5
Author: Mark Liu 
Date:   2016-12-16T02:55:53Z

fixup! Fix Code Style

commit d6d71a717e8ed7ab32ffa02621c837c797f66cd7
Author: Mark Liu 
Date:   2016-12-20T19:15:59Z

fixup! Address Ahmet comments




> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765219#comment-15765219
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

Github user markflyhigh closed the pull request at:

https://github.com/apache/incubator-beam/pull/1639


> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1117) Support for new Timer API in Direct runner

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765203#comment-15765203
 ] 

ASF GitHub Bot commented on BEAM-1117:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1667

[BEAM-1117] Support user timers for ParDo in the direct runner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @tgroh 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam DirectRunner-timers

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1667


commit a3ac176cd7edb18d4f633682ee0e6ff30ab76f64
Author: Kenneth Knowles 
Date:   2016-12-08T04:18:44Z

Allow setting timer by ID in DirectTimerInternals

commit 445750d6cf36f1eda1094531541788260c3fe229
Author: Kenneth Knowles 
Date:   2016-12-08T18:27:23Z

No longer reject timers for ParDo in direct runner

commit d428abe9e12ddd2609773512a180589ff960d954
Author: Kenneth Knowles 
Date:   2016-12-08T23:18:44Z

Deliver timers in the direct runner

commit 6915bbc550ad692656e8eeb1ba7161213c9a6ce6
Author: Kenneth Knowles 
Date:   2016-12-20T04:40:11Z

Add informative Instant formatter to BoundedWindow

commit 2af3f93602b5299cc33c876310a784fc82ff4941
Author: Kenneth Knowles 
Date:   2016-12-20T04:40:47Z

Use informative Instant formatter in WatermarkHold




> Support for new Timer API in Direct runner
> --
>
> Key: BEAM-1117
> URL: https://issues.apache.org/jira/browse/BEAM-1117
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1117) Support for new Timer API in Direct runner

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765191#comment-15765191
 ] 

ASF GitHub Bot commented on BEAM-1117:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1581


> Support for new Timer API in Direct runner
> --
>
> Key: BEAM-1117
> URL: https://issues.apache.org/jira/browse/BEAM-1117
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-direct
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765055#comment-15765055
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1652


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1176) Make our test suites use @Rule TestPipeline

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764973#comment-15764973
 ] 

ASF GitHub Bot commented on BEAM-1176:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1664


> Make our test suites use @Rule TestPipeline
> ---
>
> Key: BEAM-1176
> URL: https://issues.apache.org/jira/browse/BEAM-1176
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Stas Levin
>Priority: Minor
>
> Now that [~staslev] has made {{TestPipeline}} a JUnit rule that performs 
> useful sanity checks, we should port all of our tests to it so that they set 
> a good example for users. Maybe we'll even catch some straggling tests with 
> errors :-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764802#comment-15764802
 ] 

ASF GitHub Bot commented on BEAM-362:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1666

[BEAM-362] Move ExecutionContext and related classes to runners-core

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @lukecwik 

This is built on top of #1665 and will require a new worker image.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam ExecutionContext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1666


commit 03a85e82ca5f1dff5aae184907508d7c5309a404
Author: Kenneth Knowles 
Date:   2016-12-16T04:13:25Z

Remove deprecated AggregatorFactory from SDK

commit 6d7a4b10ba74b1fc08d0ad6a759ca5e0ebffdbba
Author: Kenneth Knowles 
Date:   2016-12-16T04:20:34Z

Move ExecutionContext and related classes to runners-core




> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764746#comment-15764746
 ] 

ASF GitHub Bot commented on BEAM-362:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1665

[BEAM-362] Remove deprecated AggregatorFactory from SDK

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The reason this change actually adds real value is that it is the last 
reference to `ExecutionContext` in the SDK. After this PR, we can move all that 
to runners-core, which in turn starts to unblock moving `StateInternals` and 
`TimerInternals` to runners-core. I haven't included that here since it will 
require a Dataflow worker process while this will go right in.

R: @lukecwik (randomly selected committer)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam AggregatorFactory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1665


commit 7484ed9903b9b41dbb47c08c2c0eb8c45d3371ac
Author: Kenneth Knowles 
Date:   2016-12-16T04:13:25Z

Remove deprecated AggregatorFactory from SDK




> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1176) Make our test suites use @Rule TestPipeline

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764731#comment-15764731
 ] 

ASF GitHub Bot commented on BEAM-1176:
--

GitHub user staslev opened a pull request:

https://github.com/apache/incubator-beam/pull/1664

[BEAM-1176] Migrating test to use TestPipeline as a JUnit rule

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/staslev/incubator-beam 
BEAM-1176-migrating-to-TestPipeline-rule

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1664


commit 83068990d803edabee7aefa32de9542f84d087ac
Author: Stas Levin 
Date:   2016-12-18T16:25:33Z

Migrated the beam-examples-java module to TestPipeline as a JUnit rule.

commit ef9cabfc8dbc95d6f7748404740751fec5c9a17c
Author: Stas Levin 
Date:   2016-12-18T16:38:11Z

Migrated the beam-examples-java8 module to TestPipeline as a JUnit rule.

commit 6b68d7989d0a4fd591a10b49d40f9d77d29d3ac2
Author: Stas Levin 
Date:   2016-12-18T16:51:31Z

Migrated the beam-runners-core module to TestPipeline as a JUnit rule.

commit d95030560baa00e765f481fed256b9ad7ab00e53
Author: Stas Levin 
Date:   2016-12-19T08:20:16Z

Migrated the beam-runners-direct-java module to TestPipeline as a JUnit 
rule.

commit 2ea56f452f3fc19920e2b53f7effdb77e5774e76
Author: Stas Levin 
Date:   2016-12-19T21:54:47Z

Migrated the beam-sdks-java-core module to TestPipeline as a JUnit rule.
Plus, fixed some checkstyle errors from previous modules' migration.

commit 5e23bee64d0f186071cb90f95293abcfcbfb5250
Author: Stas Levin 
Date:   2016-12-19T22:01:31Z

Migrated the beam-sdks-java-extensions-join-library module to TestPipeline 
as a JUnit rule.

commit 9ae205550f222501fdc7de8d89666b98ab0f5620
Author: Stas Levin 
Date:   2016-12-20T07:54:57Z

Migrated the beam-sdks-java-extensions-sorter module to TestPipeline as a 
JUnit rule.

commit 0bf119d677112d5ed7f15623f86e5478ce949b13
Author: Stas Levin 
Date:   2016-12-20T11:26:07Z

Migrated the beam-sdks-java-io-google-cloud-platform module to TestPipeline 
as a JUnit rule.

commit d6207df93712fc53e3921f2da1ae42a86dbd9696
Author: Stas Levin 
Date:   2016-12-20T15:26:51Z

Migrated the beam-sdks-java-io-jdbc module to TestPipeline as a JUnit rule.

commit e3c5841a017fd71ac04c8550964753eb1a5fa802
Author: Stas Levin 
Date:   2016-12-20T15:31:23Z

Migrated the beam-sdks-java-io-jms module to TestPipeline as a JUnit rule.

commit 75020fbc235e4e1c57a4efd12c6a70ffcc763205
Author: Stas Levin 
Date:   2016-12-20T15:38:38Z

Migrated the beam-sdks-java-io-kafka module to TestPipeline as a JUnit rule.

commit 316ddcac2e8eadd778a20426e7e3cc746adbc767
Author: Stas Levin 
Date:   2016-12-20T15:44:15Z

Migrated the beam-sdks-java-io-kinesis module to TestPipeline as a JUnit 
rule.

commit 1e6390c917bc0365c440d505dc87e1ea6b13fe32
Author: Stas Levin 
Date:   2016-12-20T16:09:30Z

Migrated the beam-sdks-java-io-mongodb module to TestPipeline as a JUnit 
rule.

commit 51d38973076d9ed03bba2c38fe2f70b0ce17f6d4
Author: Stas Levin 
Date:   2016-12-20T16:57:57Z

Migrated the beam-sdks-java-io-java8tests module to TestPipeline as a JUnit 
rule + fixed WithTimestampsJava8Test.withTimestampsLambdaShouldApplyTimestamps.

commit 07a46f1a998f49b275d9639c92a0461d68803b77
Author: Stas Levin 
Date:   2016-12-20T17:15:44Z

Fixed checkstyle issues.




> Make our test suites use @Rule TestPipeline
> ---
>
> Key: BEAM-1176
> URL: https://issues.apache.org/jira/browse/BEAM-1176
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>

[jira] [Commented] (BEAM-79) Gearpump runner

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764476#comment-15764476
 ] 

ASF GitHub Bot commented on BEAM-79:


Github user manuzhang closed the pull request at:

https://github.com/apache/incubator-beam/pull/1663


> Gearpump runner
> ---
>
> Key: BEAM-79
> URL: https://issues.apache.org/jira/browse/BEAM-79
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-gearpump
>Reporter: Tyler Akidau
>Assignee: Manu Zhang
>
> Intel is submitting Gearpump (http://www.gearpump.io) to ASF 
> (https://wiki.apache.org/incubator/GearpumpProposal). Appears to be a mix of 
> low-level primitives a la MillWheel, with some higher level primitives like 
> non-merging windowing mixed in. Seems like it would make a nice Beam runner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-79) Gearpump runner

2016-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764357#comment-15764357
 ] 

ASF GitHub Bot commented on BEAM-79:


GitHub user manuzhang opened a pull request:

https://github.com/apache/incubator-beam/pull/1663

[BEAM-79] merge master into gearpump-runner branch

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam gearpump-runner-sync

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1663.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1663


commit 0af145bba721f7cdf3fbcd15330df433d5a8431d
Author: Davor Bonaci 
Date:   2016-12-07T01:00:31Z

This closes #1504

commit e9f254ef2769a082c7fbb500c1c28c6224ac5a7f
Author: Jakob Homan 
Date:   2016-12-07T00:59:50Z

[BEAM-1099] Minor typos in KafkaIO

commit afedd68e806830549724dfc0f2565d756cc6b46d
Author: Davor Bonaci 
Date:   2016-12-07T01:03:54Z

This closes #1524

commit e8c9686a2e898d38afd692328eb171c542084747
Author: Pei He 
Date:   2016-11-23T23:59:56Z

[BEAM-1047] Add DataflowClient wrapper on top of JSON library.

commit ded58832ceaef487f4590d9396f09744288c955d
Author: Pei He 
Date:   2016-11-24T00:14:27Z

[Code Health] Remove redundant projectId from DataflowPipelineJob.

commit ce03f30c1ee0b84ad2e7f10a6272ffb25548244a
Author: Pei He 
Date:   2016-11-28T19:47:42Z

[BEAM-1047] Update dataflow runner code to use DataflowClient wrapper.

commit b2b570f27808b1671bf6cd0fc60f874da671d4ca
Author: bchambers 
Date:   2016-12-07T01:08:13Z

Closes #1434

commit 0a2ed832ce5af7556db605e99b985ed4ffc1b152
Author: Sam McVeety 
Date:   2016-10-30T18:58:44Z

BigQueryIO.Read: support runtime options

commit 869b2710efdb90bc3ce5b6e9d4f3b49a3a804a63
Author: Aljoscha Krettek 
Date:   2016-12-07T05:28:13Z

[FLINK-1102] Fix Aggregator Registration in Flink Batch Runner

commit b41a46e86fd38c4a887f31bdf6cb75969f4750d3
Author: Aljoscha Krettek 
Date:   2016-12-07T07:26:02Z

This closes #1530

commit baf5e6bd9b1011f4c5c3974aa46393471b340c15
Author: Jean-Baptiste Onofré 
Date:   2016-12-07T07:37:33Z

[BEAM-1094] Set test scope for Kafka IO and junit

commit 9ccf6dbea0d3807fef6a7c0432906fffd2b8ec3f
Author: Sela 
Date:   2016-12-07T08:31:38Z

This closes #1531

commit dce3a196a3a26fdd42225520faf3d9084ee48123
Author: Sela 
Date:   2016-12-07T09:20:07Z

[BEAM-329] Update Spark runner README.

commit 02bb8c375c48847b1686d70184fb194500a62e8c
Author: Jean-Baptiste Onofré 
Date:   2016-12-07T11:51:09Z

[BEAM-329] This closes #1532

commit b2d72237b592e1dcb5cca30f5cbc9a11d2890c0f
Author: Kenneth Knowles 
Date:   2016-12-06T23:20:28Z

Port most of DoFnRunner Javadoc to new DoFn

commit 1526184ae8be1f8ae6863f830653204157a584cd
Author: Thomas Groh 
Date:   2016-12-07T16:51:02Z

This closes #1527

commit 8e1e46e73edf9cce376ed7bd194db00edc3e60b4
Author: Kenneth Knowles 
Date:   2016-12-07T05:01:37Z

Port ParDoTest from OldDoFn to new DoFn

commit ae52ec1bc3f3120e9f8e150e50c18564055a467c
Author: Kenneth Knowles 
Date:   2016-12-07T17:00:18Z

This closes #1529

commit 55d333bff68809ff1a9154491ace02d2d16e3b85
Author: Thomas Groh 
Date:   2016-12-05T22:29:05Z

Only provide expanded Inputs and Outputs

This removes PInput and POutput from the immediate API Surface of
TransformHierarchy.Node, and forces Pipeline Visitors to access only
the expanded version of the output.

This is part of the move towards the runner-agnostic representation of a
graph.

commit 5b31a369962907e257de8019fbf6cde4c615b1c0
Author: Thomas Groh 
Date:   2016-12-07T17:14:38Z

This closes #1511

commit 43fef2775145f67def3ab8a246ecca192a7d650b
Author: 

[jira] [Commented] (BEAM-1101) Remove inconsistencies in Python PipelineOptions

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763421#comment-15763421
 ] 

ASF GitHub Bot commented on BEAM-1101:
--

Github user pabloem closed the pull request at:

https://github.com/apache/incubator-beam/pull/1526


> Remove inconsistencies in Python PipelineOptions
> 
>
> Key: BEAM-1101
> URL: https://issues.apache.org/jira/browse/BEAM-1101
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Frances Perry
>
> Some options have been removed from Java, and some have different default 
> behavior in Java. Gotta make this consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1101) Remove inconsistencies in Python PipelineOptions

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763422#comment-15763422
 ] 

ASF GitHub Bot commented on BEAM-1101:
--

GitHub user pabloem reopened a pull request:

https://github.com/apache/incubator-beam/pull/1526

[BEAM-1101] Remove inconsistencies in Python PipelineOptions

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam 
poptions-inconsistencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1526


commit 6f091898a84154d63974f4fa0717406f472d99a3
Author: Pablo 
Date:   2016-12-07T02:01:54Z

Fixing inconsistencies in PipelineOptions




> Remove inconsistencies in Python PipelineOptions
> 
>
> Key: BEAM-1101
> URL: https://issues.apache.org/jira/browse/BEAM-1101
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Pablo Estrada
>Assignee: Frances Perry
>
> Some options have been removed from Java, and some have different default 
> behavior in Java. Gotta make this consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1180) Implement GearpumpPipelineResult

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763017#comment-15763017
 ] 

ASF GitHub Bot commented on BEAM-1180:
--

GitHub user manuzhang opened a pull request:

https://github.com/apache/incubator-beam/pull/1661

[BEAM-1180] Implement GearpumpPipelineResult

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam 
gearpump-runner-result

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1661


commit efd1021e9803418c54bba5383c87abafed611d56
Author: manuzhang 
Date:   2016-12-20T02:39:56Z

[BEAM-1180] Implement GearpumpPipelineResult




> Implement GearpumpPipelineResult
> 
>
> Key: BEAM-1180
> URL: https://issues.apache.org/jira/browse/BEAM-1180
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-gearpump
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-59) IOChannelFactory rethinking/redesign

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762641#comment-15762641
 ] 

ASF GitHub Bot commented on BEAM-59:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1558


> IOChannelFactory rethinking/redesign
> 
>
> Key: BEAM-59
> URL: https://issues.apache.org/jira/browse/BEAM-59
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core, sdk-java-gcp
>Reporter: Daniel Halperin
>Assignee: Pei He
>
> Right now, FileBasedSource and FileBasedSink communication is mediated by 
> IOChannelFactory. There are a number of issues:
> * Global configuration -- e.g., all 'gs://' URIs use the same credentials. 
> This should be per-source/per-sink/etc.
> * Supported APIs -- currently IOChannelFactory is in the "non-public API" 
> util package and subject to change. We need users to be able to add new 
> backends ('s3://', 'hdfs://', etc.) directly, without fear that they will be 
> broken.
> * Per-backend features: e.g., creating buckets in GCS/s3, setting expiration 
> time, etc.
> Updates:
> Design docs posted on dev@ list:
> Part 1: IOChannelFactory Redesign: 
> https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
> Part 2: Configurable BeamFileSystem:
> https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762533#comment-15762533
 ] 

ASF GitHub Bot commented on BEAM-27:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1660

[BEAM-27] Support setting and deleting timers by ID in 
InMemoryTimerInternals

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This is build on top of the move to runners-core in #1652. Only the second 
commit contains nontrivial changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
InMemoryTimerInternals-dedup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1660


commit 6effd6c93587d1c0a02ba9285b47d165ae5c517d
Author: Kenneth Knowles 
Date:   2016-12-16T04:45:56Z

Move InMemoryTimerInternals to runners-core

commit 557d2d724c53233ad9d34c9239ff5cf77b754d73
Author: Kenneth Knowles 
Date:   2016-12-17T04:22:59Z

Restore SDK's InMemoryTimerInternals, deprecated

commit 0ef0e3a3ecabafdd934f7d03f47f0a220187fb22
Author: Kenneth Knowles 
Date:   2016-12-19T22:01:36Z

Support set and delete of timer by ID in InMemoryTimerInternals




> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1143) Timestamps on Jenkins log lines

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762129#comment-15762129
 ] 

ASF GitHub Bot commented on BEAM-1143:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1659


> Timestamps on Jenkins log lines
> ---
>
> Key: BEAM-1143
> URL: https://issues.apache.org/jira/browse/BEAM-1143
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kenneth Knowles
>Assignee: Jason Kuster
>Priority: Minor
>
> I suspect this might be doable more universally in the groovy DSL scripts, 
> but we would gain some value by a port of 
> https://github.com/apache/incubator-beam/commit/7f82a573d00a5a30331b7bbb8757e55f4a2d93ae
>  to the most appropriate analog for Jenkins. (in the worst case, just exactly 
> porting the env var)
> We are currently regularly bottlenecked on build duration/backlog and the 
> time seems to exist outside of the durations accounted for by Maven's usual 
> output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1143) Timestamps on Jenkins log lines

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762107#comment-15762107
 ] 

ASF GitHub Bot commented on BEAM-1143:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1659

[BEAM-1143] More escaping in Jenkins timestamp spec

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

The current timestamp spec is insufficiently escaped. The failure mode is 
that relative timestamps are shown instead of a readable standard datetime.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam jenkins-timestamps

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1659


commit 627ccb522d7d818aebe2256246e81af2d3062d11
Author: Kenneth Knowles 
Date:   2016-12-19T19:39:29Z

More escaping in Jenkins timestamp spec




> Timestamps on Jenkins log lines
> ---
>
> Key: BEAM-1143
> URL: https://issues.apache.org/jira/browse/BEAM-1143
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kenneth Knowles
>Assignee: Jason Kuster
>Priority: Minor
>
> I suspect this might be doable more universally in the groovy DSL scripts, 
> but we would gain some value by a port of 
> https://github.com/apache/incubator-beam/commit/7f82a573d00a5a30331b7bbb8757e55f4a2d93ae
>  to the most appropriate analog for Jenkins. (in the worst case, just exactly 
> porting the env var)
> We are currently regularly bottlenecked on build duration/backlog and the 
> time seems to exist outside of the durations accounted for by Maven's usual 
> output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1143) Timestamps on Jenkins log lines

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762042#comment-15762042
 ] 

ASF GitHub Bot commented on BEAM-1143:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1640


> Timestamps on Jenkins log lines
> ---
>
> Key: BEAM-1143
> URL: https://issues.apache.org/jira/browse/BEAM-1143
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kenneth Knowles
>Assignee: Jason Kuster
>Priority: Minor
>
> I suspect this might be doable more universally in the groovy DSL scripts, 
> but we would gain some value by a port of 
> https://github.com/apache/incubator-beam/commit/7f82a573d00a5a30331b7bbb8757e55f4a2d93ae
>  to the most appropriate analog for Jenkins. (in the worst case, just exactly 
> porting the env var)
> We are currently regularly bottlenecked on build duration/backlog and the 
> time seems to exist outside of the durations accounted for by Maven's usual 
> output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-841) Releases should produce proper MD5/SHA1 checksums

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15761711#comment-15761711
 ] 

ASF GitHub Bot commented on BEAM-841:
-

Github user wikier closed the pull request at:

https://github.com/apache/incubator-beam/pull/1206


> Releases should produce proper MD5/SHA1 checksums
> -
>
> Key: BEAM-841
> URL: https://issues.apache.org/jira/browse/BEAM-841
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: 0.3.0-incubating
>Reporter: Sergio Fernández
>Priority: Trivial
>
> Currently {{09 7B 6A 0A C9 3E 71 C1  05 0C 71 65 E9 0C 4F AE}} is used, while 
> most of the tools use the simpler format {{097b6a0ac93e71c1050c7165e90c4fae}} 
> to allow automatically checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1165) Unexpected file created when checking dependencies on clean repo

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760886#comment-15760886
 ] 

ASF GitHub Bot commented on BEAM-1165:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1632


> Unexpected file created when checking dependencies on clean repo
> 
>
> Key: BEAM-1165
> URL: https://issues.apache.org/jira/browse/BEAM-1165
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 0.5.0-incubating
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> I just found a weird behavior when I was checking for the latest release,
> nothing breaking, but when I start with a clean repo clone and I do:
> mvn dependency:tree
> It creates a new file runners/flink/examples/wordcounts.txt with the
> dependencies.
> This error happens because maven-dependency-plugin asumes the property output
> used by the flink tests as the export file for the command.
> Ref.
> https://maven.apache.org/plugins/maven-dependency-plugin/tree-mojo.html#output



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-480) Use BigQueryServices abstraction in BigQueryIO

2016-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15760579#comment-15760579
 ] 

ASF GitHub Bot commented on BEAM-480:
-

Github user peihe closed the pull request at:

https://github.com/apache/incubator-beam/pull/729


> Use BigQueryServices abstraction in BigQueryIO
> --
>
> Key: BEAM-480
> URL: https://issues.apache.org/jira/browse/BEAM-480
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-gcp
>Reporter: Pei He
>Assignee: Pei He
>Priority: Minor
> Fix For: Not applicable
>
>
> There are legacy code that sent request to BigQuery directly.
> They should be moved to use BigQueryServices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1178) Make naming of logger objects consistent

2016-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15759358#comment-15759358
 ] 

ASF GitHub Bot commented on BEAM-1178:
--

GitHub user iemejia opened a pull request:

https://github.com/apache/incubator-beam/pull/1655

[BEAM-1178] Make naming of logger objects consistent

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [X] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [X] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [X] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/incubator-beam BEAM-1178-logger

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1655


commit fbfea5953000bd77b07b6a1f1c7a192e24b88021
Author: Ismaël Mejía 
Date:   2016-12-18T15:02:41Z

Fix grammar error (repeated for)

commit 11ba4d3638da2859727206d8ea887298efcad34a
Author: Ismaël Mejía 
Date:   2016-12-18T20:01:13Z

[BEAM-1178] Make naming of logger objects consistent




> Make naming of logger objects consistent
> 
>
> Key: BEAM-1178
> URL: https://issues.apache.org/jira/browse/BEAM-1178
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-extensions
>Affects Versions: 0.5.0-incubating
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
>
> Logger objects are used in different instances in Beam, around 90% of the 
> current classes that use loggers use the convention name 'LOG', however there 
> are instances that use 'logger' and others that uses 'LOGGER', this issue is 
> to make the logger naming consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1074) Set default-partitioner in SourceRDD.Unbounded.

2016-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758766#comment-15758766
 ] 

ASF GitHub Bot commented on BEAM-1074:
--

Github user amitsela closed the pull request at:

https://github.com/apache/incubator-beam/pull/1500


> Set default-partitioner in SourceRDD.Unbounded.
> ---
>
> Key: BEAM-1074
> URL: https://issues.apache.org/jira/browse/BEAM-1074
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Amit Sela
>
> This will make sure the following stateful read within {{mapWithState}} won't 
> shuffle the read values as long as they are grouped in a {{List}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1126) Expose UnboundedSource split backlog in number of events

2016-12-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758431#comment-15758431
 ] 

ASF GitHub Bot commented on BEAM-1126:
--

Github user aviemzur closed the pull request at:

https://github.com/apache/incubator-beam/pull/1574


> Expose UnboundedSource split backlog in number of events
> 
>
> Key: BEAM-1126
> URL: https://issues.apache.org/jira/browse/BEAM-1126
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Aviem Zur
>Assignee: Daniel Halperin
>Priority: Minor
>
> Today {{UnboundedSource}} exposes split backlog in bytes via 
> {{getSplitBacklogBytes()}}
> There is value in exposing backlog in number of events as well, since this 
> number can be more human comprehensible than bytes. something like 
> {{getSplitBacklogEvents()}} or {{getSplitBacklogCount()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-85) PAssert needs sanity check that it's used correctly

2016-12-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757718#comment-15757718
 ] 

ASF GitHub Bot commented on BEAM-85:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1595


> PAssert needs sanity check that it's used correctly
> ---
>
> Key: BEAM-85
> URL: https://issues.apache.org/jira/browse/BEAM-85
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Daniel Halperin
>Assignee: Stas Levin
>
> We should validate two things:
> # DataflowAssert is not added to a pipeline that has already been run.
> # The pipeline is run after the DataflowAssert is added.
> If either of these are not validated, then it is possible that the test 
> doesn't actually verify anything.
> This code should throw an assertion error or fail in some other way.
> {code}
> Pipeline p = TestPipeline.create();
> PCollection value = p.apply(Create.of(Boolean.FALSE));
> p.run();
> DataflowAssert.thatSingleton(value).isEqualTo(true);
> {code}
> but it would pass silently.
> similarly, this code wills pass silently:
> {code}
> Pipeline p = TestPipeline.create();
> PCollection value = p.apply(Create.of(Boolean.FALSE));
> DataflowAssert.thatSingleton(value).isEqualTo(true);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757625#comment-15757625
 ] 

ASF GitHub Bot commented on BEAM-27:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1612


> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-545) Pipelines and their executions naming changes

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756581#comment-15756581
 ] 

ASF GitHub Bot commented on BEAM-545:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1653

[BEAM-545] PipelineOptions: fix parameter name

Seems like a cut and paste error. R: @peihe

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1653


commit 65afadb15cc320acc4e1562aec0de0c82fd102bd
Author: Daniel Halperin 
Date:   2016-12-17T07:47:56Z

[BEAM-545] PipelineOptions: fix parameter name

Seems like a cut and paste error. R: @peihe




> Pipelines and their executions naming changes
> -
>
> Key: BEAM-545
> URL: https://issues.apache.org/jira/browse/BEAM-545
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>Priority: Minor
> Fix For: 0.3.0-incubating
>
>
> The purpose of the changes is to clarify the differences between the two, have
> consensus between runners, and unify the implementation.
> Current states:
>  * PipelineOptions.appName defaults to mainClass name
>  * DataflowPipelineOptions.jobName defaults to appName+user+datetime
>  * FlinkPipelineOptions.jobName defaults to appName+user+datetime
> Proposal:
> 1. Replace PipelineOptions.appName with PipelineOptions.pipelineName.
> *  It is the user-visible name for a specific graph.
> *  default to mainClass name.
> *  Use cases: Find all executions of a pipeline
> 2. Add jobName to top level PipelineOptions.
> *  It is the unique name for an execution
> *  defaults to pipelineName + user + datetime + random Integer
> *  Use cases:
> -- Finding all executions by USER_A between TIME_X and TIME_Y
> -- Naming resources created by the execution. for example:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756230#comment-15756230
 ] 

ASF GitHub Bot commented on BEAM-498:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1648


> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-475) High-quality javadoc for Beam

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756032#comment-15756032
 ] 

ASF GitHub Bot commented on BEAM-475:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1650


> High-quality javadoc for Beam
> -
>
> Key: BEAM-475
> URL: https://issues.apache.org/jira/browse/BEAM-475
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> We should have good Javadoc for Beam!
> Current snapshot: http://beam.incubator.apache.org/javadoc/0.1.0-incubating/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-475) High-quality javadoc for Beam

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755901#comment-15755901
 ] 

ASF GitHub Bot commented on BEAM-475:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1650

[BEAM-475] View.asMap: minor javadoc fixes



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam javadoc-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1650


commit b0ceccb6659d60822aa9b8a84b93384c802bdefa
Author: Dan Halperin 
Date:   2016-12-17T00:26:27Z

View.asMap: minor javadoc fixes




> High-quality javadoc for Beam
> -
>
> Key: BEAM-475
> URL: https://issues.apache.org/jira/browse/BEAM-475
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
> Fix For: Not applicable
>
>
> We should have good Javadoc for Beam!
> Current snapshot: http://beam.incubator.apache.org/javadoc/0.1.0-incubating/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755809#comment-15755809
 ] 

ASF GitHub Bot commented on BEAM-498:
-

GitHub user jkff opened a pull request:

https://github.com/apache/incubator-beam/pull/1648

[BEAM-498] Undeletes DoFnInvokers.of(OldDoFn)

In #1565 I deleted some code that's actually necessary to support the 
Dataflow worker. Bad idea. We can't delete it until Dataflow worker stops using 
DoFnInvokers.of(OldDoFn).

R: @kennknowles 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam revert-some-old-dofn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1648.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1648


commit 22f23a70894df213a2bfcf389d3587995b9df9f7
Author: Eugene Kirpichov 
Date:   2016-12-16T23:26:28Z

Revert "Removes code for wrapping DoFn as an OldDoFn"

This reverts commit a22de15012c51e8b7e31143021f0a298e093bf51.

commit b8f91349f7c457b878b1d343ae1b20adae955baf
Author: Eugene Kirpichov 
Date:   2016-12-16T23:26:32Z

Revert "Removes ArgumentProvider.windowingInternals"

This reverts commit f3e8a0383bf9cb3f9452e0364f7deba113cadff9.

commit 485da3549a53407e8c2a5b6b5cf69740fee68a74
Author: Eugene Kirpichov 
Date:   2016-12-16T23:37:02Z

Revert "Moves DoFnAdapters to runners-core"

This reverts commit 33ed3238e2b3899cff061be3056c5cc29fc60a04.




> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-646) Get runners out of the apply()

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755666#comment-15755666
 ] 

ASF GitHub Bot commented on BEAM-646:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1547


> Get runners out of the apply()
> --
>
> Key: BEAM-646
> URL: https://issues.apache.org/jira/browse/BEAM-646
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Thomas Groh
>
> Right now, the runner intercepts calls to apply() and replaces transforms as 
> we go. This means that there is no "original" user graph. For portability and 
> misc architectural benefits, we would like to build the original graph first, 
> and have the runner override later.
> Some runners already work in this manner, but we could integrate it more 
> smoothly, with more validation, via some handy APIs on e.g. the Pipeline 
> object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1089) Jenkins comments on PRs are too many & too large

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755653#comment-15755653
 ] 

ASF GitHub Bot commented on BEAM-1089:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1647

[BEAM-1089] Replace "--none--" with a message in Jenkins comments

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @jasonkuster (I'll grab a committer later)

I believe there was some expectation that the magic value "--none--" would 
disable comments. Instead, though, Jenkins is writing that exact string to our 
pull requests. Until we figure out how to actually disable comments, we may as 
well leave informative messages.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam jenkins-messages

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1647.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1647


commit 814b029689606d9d37dd696e4599efa0646a204b
Author: Kenneth Knowles 
Date:   2016-12-16T21:59:31Z

Replace "--none--" with a message in Jenkins comments

I believe there was some expectation that the magic value "--none--"
would disable comments. Instead, though, Jenkins is writing that
exact string to our pull requests. Until we figure out how to actually
disable comments, we may as well leave informative messages.




> Jenkins comments on PRs are too many & too large
> 
>
> Key: BEAM-1089
> URL: https://issues.apache.org/jira/browse/BEAM-1089
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kenneth Knowles
>Assignee: Jason Kuster
>
> Lately, I've been finding review comments somewhat drowned out by asfbot 
> copying build results onto a PR. It also generates a lot of needless email. I 
> have not yet tried to devise just the right filter, hoping we can just return 
> to the normal practice of leaving just a commit status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755649#comment-15755649
 ] 

ASF GitHub Bot commented on BEAM-362:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1592


> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755599#comment-15755599
 ] 

ASF GitHub Bot commented on BEAM-362:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1643


> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755477#comment-15755477
 ] 

ASF GitHub Bot commented on BEAM-1108:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1646


> Remove deprecated Dataflow Runner options and update documentation
> --
>
> Key: BEAM-1108
> URL: https://issues.apache.org/jira/browse/BEAM-1108
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: Not applicable
>
>
> Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} 
> configurations, plus improving documentation. Will update bug description as 
> more tasks arise.
> 1. Remove the {{TEARDOWN_POLICY}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755143#comment-15755143
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

GitHub user markflyhigh reopened a pull request:

https://github.com/apache/incubator-beam/pull/1639

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

 - E2e test framework that supports TestRunner start and verify pipeline 
job.
   - add `TestOptions` which defined `on_success_matcher` that is used to 
verify state/output of pipeline job.
   - validate `on_success_matcher` before pipeline execution to make sure 
it's unpicklable to a subclass of BaseMatcher.
   - create a `TestDataflowRunner` which provide functionalities of 
`DataflowRunner` plus result verification.
   - provide a test verifier `PipelineStateMatcher` that can verify 
pipeline job finished in DONE or not.
 - Add wordcount_it (it = integration test) that build e2e test based on 
existing wordcount pipeline.
   - include wordcount_it to nose collector, so that wordcount_it can be 
collected and run by nose.
   - skip ITs when running unit tests from tox in precommit and postcommit.

Current changes will not change behavior of existing pre/postcommit.
Test is done by running `tox -e py27 -c sdks/python/tox.ini` for unit test 
and running wordcount_it with `TestDataflowRunner` on service 
([link](https://pantheon.corp.google.com/dataflow/job/2016-12-15_17_36_16-3857167705491723621?project=google.com:clouddfe)).

TODO:
 - Output data verifier that verify pipeline output that stores in 
filesystem.
 - Add wordcount_it to precommit and replace existing wordcount execution 
command in postcommit with a better structured nose command.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markflyhigh/incubator-beam e2e-testrunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1639


commit e1e1fa3a60e1fe234829432d144d6689e240b6f0
Author: Mark Liu 
Date:   2016-12-16T01:41:20Z

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

commit 0e7007879ee082e3afe5db36107f51c03274f3f5
Author: Mark Liu 
Date:   2016-12-16T02:55:53Z

fixup! Fix Code Style




> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755142#comment-15755142
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

Github user markflyhigh closed the pull request at:

https://github.com/apache/incubator-beam/pull/1639


> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-450) Modules are shaded to the same path

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754870#comment-15754870
 ] 

ASF GitHub Bot commented on BEAM-450:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1633


> Modules are shaded to the same path
> ---
>
> Key: BEAM-450
> URL: https://issues.apache.org/jira/browse/BEAM-450
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>  Labels: newbie, starter
>
> Right now multiple modules are using the same repackaged path. We should be 
> using per-artifact paths so that they don't conflict.
> One proposal was simply to adopt 
> {{${project.groupId}.${project.artifactId}.repackaged}} as the shading 
> location. If it works.
> This is a good starter issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1108) Remove deprecated Dataflow Runner options and update documentation

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754857#comment-15754857
 ] 

ASF GitHub Bot commented on BEAM-1108:
--

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1646

[BEAM-1108] Remove outdated language about experimental autoscaling

R: @lukecwik or @kennknowles or @tgroh 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam autoscaling-language

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1646


commit 5d33aa79663a3f30dbd11ae9e8733181edde1a2c
Author: Dan Halperin 
Date:   2016-12-16T16:23:22Z

[BEAM-1108] Remove outdated language about experimental autoscaling




> Remove deprecated Dataflow Runner options and update documentation
> --
>
> Key: BEAM-1108
> URL: https://issues.apache.org/jira/browse/BEAM-1108
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Affects Versions: Not applicable
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>Priority: Minor
> Fix For: Not applicable
>
>
> Umbrella bug for removing deprecated {{DataflowPipelineXOptions}} 
> configurations, plus improving documentation. Will update bug description as 
> more tasks arise.
> 1. Remove the {{TEARDOWN_POLICY}} option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753990#comment-15753990
 ] 

ASF GitHub Bot commented on BEAM-362:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1644


> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753479#comment-15753479
 ] 

ASF GitHub Bot commented on BEAM-362:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1644

[BEAM-362] Port runners to runners-core AggregatoryFactory

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @amitsela 

For context: `runners.core.AggregatorFactory` is a copy of 
`sdk.transforms.Aggregator.AggregatorFactory`. So I am just porting everything 
to the non-deprecated bit so I can delete it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
runners-core-AggregatorFactory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1644


commit 63ac16669e7884dd42d431b9948ed675f7af3f03
Author: Kenneth Knowles 
Date:   2016-12-16T05:06:14Z

Port runners to runners-core AggregatoryFactory




> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-362) Move shared runner functionality out of SDK and into runners/core-java

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753444#comment-15753444
 ] 

ASF GitHub Bot commented on BEAM-362:
-

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1643

[BEAM-362] Move InMemoryTimerInternals to runners-core

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @jkff (I'll grab a committer later, but your contributions are thing 
affected)

The only substantive aspect of this change is the move of the timer 
internals directly into `ProcessFn`. I think this is right or at least the 
right compromise for many reasons.

 - `TimerInternals` is really runner-facing; we don't want that interface 
in the SDK.
 - Changes to `TimerInternals` incur worker compatibility concerns, so 
getting it into runners-core is a win. (I have to change it soon, so I am 
trying to make my life easier)
 - `DoFnTester` doesn't actually support timers at all, so it didn't make 
sense for them to be in there.
 - When `DoFnTester` does support timers, it is trivial to make its own 
priority queues, and it will also want to offer greater insight via a richer 
`TestingTimerInternals` sort of implementation, which needs not implement the 
same interface, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
InMemoryTimerInternals

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1643


commit 5d0bf9895654c12e50410a98347eb9177de00b1d
Author: Kenneth Knowles 
Date:   2016-12-16T04:45:56Z

Move InMemoryTimerInternals to runners-core




> Move shared runner functionality out of SDK and into runners/core-java
> --
>
> Key: BEAM-362
> URL: https://issues.apache.org/jira/browse/BEAM-362
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1112) Python E2E Integration Test Framework - Batch Only

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753301#comment-15753301
 ] 

ASF GitHub Bot commented on BEAM-1112:
--

GitHub user markflyhigh opened a pull request:

https://github.com/apache/incubator-beam/pull/1639

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

 - E2e test framework that supports TestRunner start and verify pipeline 
job.
   - add `TestOptions` which defined `on_success_matcher` that is used to 
verify state/output of pipeline job.
   - validate `on_success_matcher` before pipeline execution to make sure 
it's unpicklable to a subclass of BaseMatcher.
   - create a `TestDataflowRunner` which provide functionalities of 
`DataflowRunner` plus result verification.
   - provide a test verifier `PipelineStateMatcher` that can verify 
pipeline job finished in DONE or not.
 - Add wordcount_it (it = integration test) that build e2e test based on 
existing wordcount pipeline.
   - include wordcount_it to nose collector, so that wordcount_it can be 
collected and run by nose.
   - skip ITs when running unit tests from tox in precommit and postcommit.

Current changes will not change behavior of existing pre/postcommit.
Test is done by running `tox -e py27 -c sdks/python/tox.ini` for unit test 
and running wordcount_it with `TestDataflowRunner` on service 
([link](https://pantheon.corp.google.com/dataflow/job/2016-12-15_17_36_16-3857167705491723621?project=google.com:clouddfe)).

TODO:
 - Output data verifier that verify pipeline output that stores in 
filesystem.
 - Add wordcount_it to precommit and replace existing wordcount execution 
command in postcommit with a better structured nose command.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markflyhigh/incubator-beam e2e-testrunner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1639


commit e1e1fa3a60e1fe234829432d144d6689e240b6f0
Author: Mark Liu 
Date:   2016-12-16T01:41:20Z

[BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

commit 0e7007879ee082e3afe5db36107f51c03274f3f5
Author: Mark Liu 
Date:   2016-12-16T02:55:53Z

fixup! Fix Code Style




> Python E2E Integration Test Framework - Batch Only
> --
>
> Key: BEAM-1112
> URL: https://issues.apache.org/jira/browse/BEAM-1112
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py, testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> Parity with Java. 
> Build e2e integration test framework that can configure and run batch 
> pipeline with specified test runner, wait for pipeline execution and verify 
> results with given verifiers in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1084) Update apitools to version 0.5.6

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753081#comment-15753081
 ] 

ASF GitHub Bot commented on BEAM-1084:
--

Github user sb2nov closed the pull request at:

https://github.com/apache/incubator-beam/pull/1501


> Update apitools to version 0.5.6
> 
>
> Key: BEAM-1084
> URL: https://issues.apache.org/jira/browse/BEAM-1084
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>
> There are some fixes to JsonValue that should be included in beam 
> (https://github.com/google/apitools/pull/136) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1104) WordCount: Metrics error in the DirectRunner

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753030#comment-15753030
 ] 

ASF GitHub Bot commented on BEAM-1104:
--

Github user bjchambers closed the pull request at:

https://github.com/apache/incubator-beam/pull/1615


> WordCount: Metrics error in the DirectRunner
> 
>
> Key: BEAM-1104
> URL: https://issues.apache.org/jira/browse/BEAM-1104
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Daniel Halperin
>Assignee: Ben Chambers
>
> I'm following the Beam quickstart to analyze the pom.xml for the examples 
> archetype in the DirectRunner:
> Generate the project:
> {code}
> mvn archetype:generate \
>   
> -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots 
> \  
>   -DarchetypeGroupId=org.apache.beam \
>   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>   -DarchetypeVersion=LATEST \
>   -DgroupId=org.example \
>   -DartifactId=word-count-beam \
>   -Dversion="0.1" \
>   -Dpackage=org.apache.beam.examples \
>   -DinteractiveMode=false
> {code}
> Count words in the pom.xml:
> {code}
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner
> {code}
> The logs:
> {code}
> INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern pom.xml
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
> getCurrentContainer
> SEVERE: Unable to update metrics on the current thread. Most likely caused by 
> using metrics outside the managed work-execution thread.
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement
> INFO: Initializing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$2 processElement
> INFO: Finalizing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@3701012a.
> {code}
> Presumably, this {{SEVERE}} warning is indicative of a bug (or should be 
> masked).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1153) GcsUtil needs to set timeout and retry explicitly in BatchRequest.

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753013#comment-15753013
 ] 

ASF GitHub Bot commented on BEAM-1153:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1637


> GcsUtil needs to set timeout and retry explicitly in BatchRequest.
> --
>
> Key: BEAM-1153
> URL: https://issues.apache.org/jira/browse/BEAM-1153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>Priority: Blocker
>
> Non-batch requests uses RetryHttpRequestInitializer, which set read timeout 
> as 80 seconds, and does more retries.
> Google Cloud auto generated Json library doesn't set HttpRequestInitializer 
> for batch requests.
> GcsUtil uses storageClient.batch(), and it is defined in here:
> https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256
> Without the HttpRequestInitializer, the default read timeout is 20 seconds.
> Possible fix is: https://github.com/apache/incubator-beam/pull/1608
> In additional, we can partially rollback 
> https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch 
> API for fileSize() for single files. This will make sure existing code will 
> keep work as the same way.
> PR: https://github.com/apache/incubator-beam/pull/1611



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-964) Investing exporting BQ as Avro instead of Json for dataflow runner

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752986#comment-15752986
 ] 

ASF GitHub Bot commented on BEAM-964:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1617


> Investing exporting BQ as Avro instead of Json for dataflow runner
> --
>
> Key: BEAM-964
> URL: https://issues.apache.org/jira/browse/BEAM-964
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>Priority: Minor
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1153) GcsUtil needs to set timeout and retry explicitly in BatchRequest.

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752972#comment-15752972
 ] 

ASF GitHub Bot commented on BEAM-1153:
--

GitHub user peihe opened a pull request:

https://github.com/apache/incubator-beam/pull/1637

[BEAM-1153] GcsUtil: use non-batch API for single file size requests.

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peihe/incubator-beam release-0.4.0-incubating

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1637


commit 58601f8c69b97dbdd9087b27c356c50bca7a1c8b
Author: Pei He 
Date:   2016-12-14T02:29:17Z

[BEAM-1153] GcsUtil: use non-batch API for single file size requests.




> GcsUtil needs to set timeout and retry explicitly in BatchRequest.
> --
>
> Key: BEAM-1153
> URL: https://issues.apache.org/jira/browse/BEAM-1153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>Priority: Blocker
>
> Non-batch requests uses RetryHttpRequestInitializer, which set read timeout 
> as 80 seconds, and does more retries.
> Google Cloud auto generated Json library doesn't set HttpRequestInitializer 
> for batch requests.
> GcsUtil uses storageClient.batch(), and it is defined in here:
> https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256
> Without the HttpRequestInitializer, the default read timeout is 20 seconds.
> Possible fix is: https://github.com/apache/incubator-beam/pull/1608
> In additional, we can partially rollback 
> https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch 
> API for fileSize() for single files. This will make sure existing code will 
> keep work as the same way.
> PR: https://github.com/apache/incubator-beam/pull/1611



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752949#comment-15752949
 ] 

ASF GitHub Bot commented on BEAM-498:
-

GitHub user jkff opened a pull request:

https://github.com/apache/incubator-beam/pull/1636

[BEAM-498] Moves OldDoFn to runners-core

This will of course need the usual Dataflow worker surgery.

R: @kennknowles 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam move-old-do-fn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1636


commit 374a450a66baf43786e2a22afb8e8832d3146441
Author: Eugene Kirpichov 
Date:   2016-12-16T00:16:46Z

Moves OldDoFn to runners-core




> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1125) Rename PTransform.apply to PTransform.expand

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752796#comment-15752796
 ] 

ASF GitHub Bot commented on BEAM-1125:
--

GitHub user aaltay opened a pull request:

https://github.com/apache/incubator-beam/pull/1634

[BEAM-1125] Rename PTransform.apply() to PTransform.expand()

Rename apply function to expand to match the recent change in the Java SDK.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/incubator-beam expand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1634


commit b6a0974375e5c7414a52cd3ffd2a9e9fe8d1889f
Author: Ahmet Altay 
Date:   2016-12-15T22:27:08Z

Rename PTransform.apply() to PTransform.expand()




> Rename PTransform.apply to PTransform.expand
> 
>
> Key: BEAM-1125
> URL: https://issues.apache.org/jira/browse/BEAM-1125
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py
>Reporter: Ahmet Altay
>Assignee: Ahmet Altay
>  Labels: sdk-consistency
>
> For context see:
> [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> https://lists.apache.org/thread.html/b4d9bcfbfeaa5dbcd5b68fd2344cdffe45587ff88cb714638504e759@%3Cdev.beam.apache.org%3E
> This requires renaming the apply method, updating all custom PTransforms, and 
> runners where transform.apply is called. (Based on the Java PR, this could be 
> easily done with a refactoring tool.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-498) Make DoFnWithContext the new DoFn

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752696#comment-15752696
 ] 

ASF GitHub Bot commented on BEAM-498:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1565


> Make DoFnWithContext the new DoFn
> -
>
> Key: BEAM-498
> URL: https://issues.apache.org/jira/browse/BEAM-498
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>  Labels: backward-incompatible
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-450) Modules are shaded to the same path

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752612#comment-15752612
 ] 

ASF GitHub Bot commented on BEAM-450:
-

GitHub user dhalperi opened a pull request:

https://github.com/apache/incubator-beam/pull/1633

[BEAM-450] Shade modules to separate paths

R: @lukecwik 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhalperi/incubator-beam shading-package-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1633


commit 0bed1b3c494ec23f3ca18389d7b3bce15e0bd363
Author: Dan Halperin 
Date:   2016-12-15T21:50:39Z

[BEAM-450] Shade modules to separate paths




> Modules are shaded to the same path
> ---
>
> Key: BEAM-450
> URL: https://issues.apache.org/jira/browse/BEAM-450
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 0.1.0-incubating, 0.2.0-incubating
>Reporter: Daniel Halperin
>Assignee: Daniel Halperin
>  Labels: newbie, starter
>
> Right now multiple modules are using the same repackaged path. We should be 
> using per-artifact paths so that they don't conflict.
> One proposal was simply to adopt 
> {{${project.groupId}.${project.artifactId}.repackaged}} as the shading 
> location. If it works.
> This is a good starter issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1163) Add signature keys to the release guide vote template

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752444#comment-15752444
 ] 

ASF GitHub Bot commented on BEAM-1163:
--

GitHub user iemejia opened a pull request:

https://github.com/apache/incubator-beam-site/pull/111

[BEAM-1163] Add signature keys to the vote template (release guide)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/incubator-beam-site BEAM-1163

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam-site/pull/111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #111


commit 4461b271569ae900372a7b137b66b64480968684
Author: Ismaël Mejía 
Date:   2016-12-15T20:32:09Z

[BEAM-1163] Add signature keys to the vote template (release guide)




> Add signature keys to the release guide vote template
> -
>
> Key: BEAM-1163
> URL: https://issues.apache.org/jira/browse/BEAM-1163
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
>
> A small improvement, the idea is to add just the fingerprint of the person 
> who signed the release in the template (for validation purposes:
> The release artifacts are signed with the key with fingerprint XXX
> https://dist.apache.org/repos/dist/release/incubator/beam/KEYS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1022) TableNamespace should not use Java object equality when comparing windows

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752318#comment-15752318
 ] 

ASF GitHub Bot commented on BEAM-1022:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1400


> TableNamespace should not use Java object equality when comparing windows
> -
>
> Key: BEAM-1022
> URL: https://issues.apache.org/jira/browse/BEAM-1022
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Affects Versions: 0.3.0-incubating
>Reporter: Reuven Lax
>Assignee: Thomas Groh
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1154) ReduceFnRunner fetches side input from the wrong window

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752305#comment-15752305
 ] 

ASF GitHub Bot commented on BEAM-1154:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1620


> ReduceFnRunner fetches side input from the wrong window
> ---
>
> Key: BEAM-1154
> URL: https://issues.apache.org/jira/browse/BEAM-1154
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> This 
> https://github.com/apache/incubator-beam/blame/master/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnContextFactory.java#L529
>  is incorrect - I broke it in 
> https://github.com/apache/incubator-beam/commit/90a0d0e13fa0332df805b79b1dc64860d9590217#diff-16edced77586e39a5f31907f4ced51b5R530
> It uses windowing strategy of the main input to do .getSideInputWindow() 
> instead of windowing strategy of the side input.
> The fix is very simple; trying to come up with a test...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1160) Disabling Read transform validation cause empty file patterns to unexpected succeed

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752113#comment-15752113
 ] 

ASF GitHub Bot commented on BEAM-1160:
--

GitHub user swegner opened a pull request:

https://github.com/apache/incubator-beam/pull/1627

[BEAM-1160] Add option to disable failures if filePattern resolves to empty

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Most PTransforms which take a filePattern have construction-time
validation which checks-- among other things-- that the specified
filePattern matches at least one file. This is particularly useful
for catching typos when specifying input files.

Most PTransforms also have an option to disable their
construction-time validation.  This is generally used when
validation cannot be performed at construction time: for example
because the proper credentials aren't available or the input
specification is late-bound in a template. To allow for these
scenarios and still guard against typos, FileBasedSource also
validates that the filePattern matches at least one file at
runtime.

This change adds the ability FileBasedSource to disable this runtime
validation, for cases uses case where empty filePatterns should be
allowed. FileBasedSource gains a new constructor parameter, and
PTransforms which use FileBasedSource have the option exposed in their
respective builder APIs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swegner/incubator-beam read-allow-empty-glob

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1627


commit 43195ba27b4303c852e81fa1de493ec4fec641bb
Author: Scott Wegner 
Date:   2016-12-15T18:12:25Z

Add option to disable failures if filePattern resolves to empty

Most PTransforms which take a filePattern have construction-time
validation which checks-- among other things-- that the specified
filePattern matches at least one file. This is particularly useful
for catching typos when specifying input files.

Most PTransforms also have an option to disable their
construction-time validation.  This is generally used when
validation cannot be performed at construction time: for example
because the proper credentials aren't available or the input
specification is late-bound in a template. To allow for these
scenarios and still guard against typos, FileBasedSource also
validates that the filePattern matches at least one file at
runtime.

This change adds the ability FileBasedSource to disable this runtime
validation, for cases uses case where empty filePatterns should be
allowed. FileBasedSource gains a new constructor parameter, and
PTransforms which use FileBasedSource have the option exposed in their
respective builder APIs.




> Disabling Read transform validation cause empty file patterns to unexpected 
> succeed
> ---
>
> Key: BEAM-1160
> URL: https://issues.apache.org/jira/browse/BEAM-1160
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>
> Typically, input file patterns are validated during Pipeline construction, 
> but standard Read transforms include an option to disable validation. This is 
> generally useful but can lead to cases where a Pipeline executes successfully 
> with empty inputs.
> We should fail execution on empty file-based inputs even when validation is 
> disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-647) Fault-tolerant sideInputs via Broadcast variables

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752015#comment-15752015
 ] 

ASF GitHub Bot commented on BEAM-647:
-

GitHub user kobisalant opened a pull request:

https://github.com/apache/incubator-beam/pull/1624

[BEAM-647] Fault-tolerant sideInputs via Broadcast variables

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kobisalant/incubator-beam 
BEAM-647-Fault-tolerant-sideInputs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1624.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1624


commit 228c613c70c4edf1b01294a487d49f0a5492136e
Author: ksalant 
Date:   2016-12-15T17:42:47Z

[BEAM-647] Fault-tolerant sideInputs via Broadcast variables




> Fault-tolerant sideInputs via Broadcast variables
> -
>
> Key: BEAM-647
> URL: https://issues.apache.org/jira/browse/BEAM-647
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Kobi Salant
>
> Following https://github.com/apache/incubator-beam/pull/909 which enables 
> checkpointing to recover from failures, sideInputs (being implemented by 
> broadcast variables) should be handled in a specific manner as described 
> here: 
> http://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#accumulators-and-broadcast-variables.
> This is a bit more complicated than Aggregators (via Accumulators) as they 
> are implemented using a single "aggregating"  Accumulator, while a pipeline 
> may contain multiple sideInputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-932) Findbugs doesn't pass in Spark runner

2016-12-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751544#comment-15751544
 ] 

ASF GitHub Bot commented on BEAM-932:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1463


> Findbugs doesn't pass in Spark runner
> -
>
> Key: BEAM-932
> URL: https://issues.apache.org/jira/browse/BEAM-932
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Daniel Halperin
>Assignee: Ismaël Mejía
>
> {code}
> [INFO] --- findbugs-maven-plugin:3.0.1:check (default) @ beam-runners-spark 
> ---
> [INFO] BugInstance size is 19
> [INFO] Error size is 0
> [INFO] Total bugs: 19
> [INFO] instanceof will always return false in 
> org.apache.beam.runners.spark.SparkRunner.run(Pipeline), since a 
> RuntimeException can't be a org.apache.spark.SparkException 
> [org.apache.beam.runners.spark.SparkRunner] At SparkRunner.java:[line 161]
> [INFO] 
> org.apache.beam.runners.spark.aggregators.metrics.WithNamedAggregatorsSupport$1.apply(Map$Entry)
>  makes inefficient use of keySet iterator instead of entrySet iterator 
> [org.apache.beam.runners.spark.aggregators.metrics.WithNamedAggregatorsSupport$1]
>  At WithNamedAggregatorsSupport.java:[line 125]
> [INFO] The class name 
> org.apache.beam.runners.spark.aggregators.metrics.sink.CsvSink shadows the 
> simple name of the superclass org.apache.spark.metrics.sink.CsvSink 
> [org.apache.beam.runners.spark.aggregators.metrics.sink.CsvSink] At 
> CsvSink.java:[lines 37-38]
> [INFO] The class name 
> org.apache.beam.runners.spark.aggregators.metrics.sink.GraphiteSink shadows 
> the simple name of the superclass org.apache.spark.metrics.sink.GraphiteSink 
> [org.apache.beam.runners.spark.aggregators.metrics.sink.GraphiteSink] At 
> GraphiteSink.java:[lines 37-38]
> [INFO] t must be non-null but is marked as nullable 
> [org.apache.beam.runners.spark.translation.WindowingHelpers$4] At 
> WindowingHelpers.java:[line 88]
> [INFO] Unchecked/unconfirmed cast from 
> org.apache.beam.runners.spark.translation.EvaluationContext to 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext
>  in 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$1.evaluate(ConsoleIO$Write$Unbound,
>  EvaluationContext) 
> [org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$1]
>  At StreamingTransformTranslator.java:[line 92]
> [INFO] Unchecked/unconfirmed cast from 
> org.apache.beam.runners.spark.translation.EvaluationContext to 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext
>  in 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$10.evaluate(ParDo$Bound,
>  EvaluationContext) 
> [org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$10]
>  At StreamingTransformTranslator.java:[line 360]
> [INFO] Unchecked/unconfirmed cast from 
> org.apache.beam.runners.spark.translation.EvaluationContext to 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext
>  in 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$11.evaluate(ParDo$BoundMulti,
>  EvaluationContext) 
> [org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$11]
>  At StreamingTransformTranslator.java:[line 395]
> [INFO] Unchecked/unconfirmed cast from 
> org.apache.beam.runners.spark.translation.EvaluationContext to 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext
>  in 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$2.evaluate(Read$Unbounded,
>  EvaluationContext) 
> [org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$2]
>  At StreamingTransformTranslator.java:[line 104]
> [INFO] Unchecked/unconfirmed cast from 
> org.apache.beam.runners.spark.translation.EvaluationContext to 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext
>  in 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$3.evaluate(CreateStream$QueuedValues,
>  EvaluationContext) 
> [org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$3]
>  At StreamingTransformTranslator.java:[line 115]
> [INFO] Unchecked/unconfirmed cast from 
> org.apache.beam.runners.spark.translation.EvaluationContext to 
> org.apache.beam.runners.spark.translation.streaming.StreamingEvaluationContext
>  in 
> org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$4.evaluate(Flatten$FlattenPCollectionList,
>  EvaluationContext) 
> [org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$4]
>  At 

[jira] [Commented] (BEAM-1086) Upgrade to latest Gearpump snapshot

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750218#comment-15750218
 ] 

ASF GitHub Bot commented on BEAM-1086:
--

GitHub user manuzhang opened a pull request:

https://github.com/apache/incubator-beam/pull/1623

[BEAM-1086] Upgrade to latest Gearpump snapshot

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manuzhang/incubator-beam 
gearpump-runner-upgrade

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1623


commit 7fdae1f9e94e1ba4dfb8b0f129e37510e5e6db7c
Author: manuzhang 
Date:   2016-12-09T01:20:50Z

[BEAM-1086] Upgrade to latest Gearpump snapshot




> Upgrade to latest Gearpump snapshot
> ---
>
> Key: BEAM-1086
> URL: https://issues.apache.org/jira/browse/BEAM-1086
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-gearpump
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>
> The latest Gearpump snapshot version is available under [apache repo | 
> https://repository.apache.org/content/repositories/snapshots/org/apache/gearpump/gearpump-core_2.11/0.8.3-SNAPSHOT/].
>  To support the Gearpump runner, we need to continuously evolve Gearpump and 
> bring in Beam capabilities. Depending on a snapshot version will greatly 
> accelerate the integration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1160) Disabling Read transform validation cause empty file patterns to unexpected succeed

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749956#comment-15749956
 ] 

ASF GitHub Bot commented on BEAM-1160:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1621


> Disabling Read transform validation cause empty file patterns to unexpected 
> succeed
> ---
>
> Key: BEAM-1160
> URL: https://issues.apache.org/jira/browse/BEAM-1160
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Scott Wegner
>
> Typically, input file patterns are validated during Pipeline construction, 
> but standard Read transforms include an option to disable validation. This is 
> generally useful but can lead to cases where a Pipeline executes successfully 
> with empty inputs.
> We should fail execution on empty file-based inputs even when validation is 
> disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1153) GcsUtil needs to set timeout and retry explicitly in BatchRequest.

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749772#comment-15749772
 ] 

ASF GitHub Bot commented on BEAM-1153:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1611


> GcsUtil needs to set timeout and retry explicitly in BatchRequest.
> --
>
> Key: BEAM-1153
> URL: https://issues.apache.org/jira/browse/BEAM-1153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Pei He
>Assignee: Pei He
>Priority: Blocker
>
> Non-batch requests uses RetryHttpRequestInitializer, which set read timeout 
> as 80 seconds, and does more retries.
> Google Cloud auto generated Json library doesn't set HttpRequestInitializer 
> for batch requests.
> GcsUtil uses storageClient.batch(), and it is defined in here:
> https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256
> Without the HttpRequestInitializer, the default read timeout is 20 seconds.
> Possible fix is: https://github.com/apache/incubator-beam/pull/1608
> In additional, we can partially rollback 
> https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch 
> API for fileSize() for single files. This will make sure existing code will 
> keep work as the same way.
> PR: https://github.com/apache/incubator-beam/pull/1611



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1154) ReduceFnRunner fetches side input from the wrong window

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749748#comment-15749748
 ] 

ASF GitHub Bot commented on BEAM-1154:
--

GitHub user jkff opened a pull request:

https://github.com/apache/incubator-beam/pull/1620

[BEAM-1154] Get side input from proper window in ReduceFn

R: @kennknowles 
CC: @peihe 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkff/incubator-beam combine-side-input

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1620


commit 38bca062b99900f44321035d660fa9b63f5179b4
Author: Eugene Kirpichov 
Date:   2016-12-14T22:29:30Z

[BEAM-1154] Get side input from proper window in ReduceFn




> ReduceFnRunner fetches side input from the wrong window
> ---
>
> Key: BEAM-1154
> URL: https://issues.apache.org/jira/browse/BEAM-1154
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Eugene Kirpichov
>
> This 
> https://github.com/apache/incubator-beam/blame/master/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnContextFactory.java#L529
>  is incorrect - I broke it in 
> https://github.com/apache/incubator-beam/commit/90a0d0e13fa0332df805b79b1dc64860d9590217#diff-16edced77586e39a5f31907f4ced51b5R530
> It uses windowing strategy of the main input to do .getSideInputWindow() 
> instead of windowing strategy of the side input.
> The fix is very simple; trying to come up with a test...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749724#comment-15749724
 ] 

ASF GitHub Bot commented on BEAM-1149:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1619


> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749699#comment-15749699
 ] 

ASF GitHub Bot commented on BEAM-1149:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1619

[BEAM-1149] Cherry pick some fixes to release-0.4.0

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

This fixes the remaining failures in [BEAM-1149](). I have confirmed `mvn 
clean verify` and the Spark runner's local runnable-on-service tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam cherry-picks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1619


commit 122f05131c803ecaa082dfbc9ad6876b0ca467b8
Author: Kenneth Knowles 
Date:   2016-12-14T19:26:27Z

SimpleDoFnRunner observes window if SideInputReader is nonempty

commit d9f24b86c644ea85fd197eaab4c2d16b20a70d5f
Author: Kenneth Knowles 
Date:   2016-12-14T21:12:43Z

Fix NPE in StatefulParDoEvaluatorFactoryTest mocking




> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749514#comment-15749514
 ] 

ASF GitHub Bot commented on BEAM-1149:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1618


> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749500#comment-15749500
 ] 

ASF GitHub Bot commented on BEAM-1149:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1618

[BEAM-1149] Fix NPE in StatefulParDoEvaluatorFactoryTest mocking

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @tgroh OR @amitsela 

Overlooked in hasty merge of DoFnRunner fix.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
StatefulParDoEvaluatorFactory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1618.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1618


commit 00b961df2e6086ea9f5cb8b9b8fb747739d33670
Author: Kenneth Knowles 
Date:   2016-12-14T21:12:43Z

Fix NPE in StatefulParDoEvaluatorFactoryTest mocking




> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1033) BigQueryMatcher is flaky

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749430#comment-15749430
 ] 

ASF GitHub Bot commented on BEAM-1033:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1479


> BigQueryMatcher is flaky
> 
>
> Key: BEAM-1033
> URL: https://issues.apache.org/jira/browse/BEAM-1033
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Pei He
>Assignee: Mark Liu
>
> Jenkins link:
> https://builds.apache.org/job/beam_PreCommit_MavenVerify/5145/console
> Running org.apache.beam.examples.WindowedWordCountIT
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 304.282 sec 
> <<< FAILURE! - in org.apache.beam.examples.WindowedWordCountIT
> testWindowedWordCountInBatch(org.apache.beam.examples.WindowedWordCountIT)  
> Time elapsed: 304.282 sec  <<< FAILURE!
> java.lang.AssertionError: 
> Expected: Expected checksum is (cd5b52939257e12428a9fa085c32a84dd209b180)
>  but: Invalid BigQuery response: 
> {"jobComplete":false,"jobReference":{"jobId":"job_0STNX_OD83tQOzo6MvmqXCrk61U","projectId":"apache-beam-testing"},"kind":"bigquery#queryResponse"}
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
>   at 
> org.apache.beam.runners.dataflow.testing.TestDataflowRunner.run(TestDataflowRunner.java:164)
>   at 
> org.apache.beam.runners.dataflow.testing.TestDataflowRunner.run(TestDataflowRunner.java:93)
>   at 
> org.apache.beam.runners.dataflow.testing.TestDataflowRunner.run(TestDataflowRunner.java:61)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:179)
>   at 
> org.apache.beam.examples.WindowedWordCount.main(WindowedWordCount.java:224)
>   at 
> org.apache.beam.examples.WindowedWordCountIT.testWindowedWordCountPipeline(WindowedWordCountIT.java:88)
>   at 
> org.apache.beam.examples.WindowedWordCountIT.testWindowedWordCountInBatch(WindowedWordCountIT.java:59)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at 
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Results :
> Failed tests: 
>   
> WindowedWordCountIT.testWindowedWordCountInBatch:59->testWindowedWordCountPipeline:88
>  
> Expected: Expected checksum is (cd5b52939257e12428a9fa085c32a84dd209b180)
>  but: Invalid BigQuery response: 
> {"jobComplete":false,"jobReference":{"jobId":"job_0STNX_OD83tQOzo6MvmqXCrk61U","projectId":"apache-beam-testing"},"kind":"bigquery#queryResponse"}
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-964) Investing exporting BQ as Avro instead of Json for dataflow runner

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749284#comment-15749284
 ] 

ASF GitHub Bot commented on BEAM-964:
-

GitHub user sb2nov opened a pull request:

https://github.com/apache/incubator-beam/pull/1617

[BEAM-964] json avro flag migration

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @chamikaramj PTAL

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sb2nov/incubator-beam 
BEAM-964-json-avro-flag-migration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1617


commit 0a558c7171d6e4452d88ecffd16a024a19cbfc42
Author: Sourabh Bajaj 
Date:   2016-12-14T19:44:46Z

Update the BQ export flat from Json to Avro




> Investing exporting BQ as Avro instead of Json for dataflow runner
> --
>
> Key: BEAM-964
> URL: https://issues.apache.org/jira/browse/BEAM-964
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Sourabh Bajaj
>Assignee: Sourabh Bajaj
>Priority: Minor
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749281#comment-15749281
 ] 

ASF GitHub Bot commented on BEAM-1149:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1616


> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1149) Side input access fails in direct runner (possibly others too) when input element in multiple windows

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749260#comment-15749260
 ] 

ASF GitHub Bot commented on BEAM-1149:
--

GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1616

[BEAM-1149] SimpleDoFnRunner observes window if SideInputReader is nonempty

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

R: @amitsela 

This is a quick fix. There is some redundancy with 
`PushbackSideInputDoFnRunner`.

Confirmed that the following run - which is only run on postcommit - fails 
on `master` and succeeds with this PR:

```
mvn --batch-mode --errors verify -pl runners/spark \
-Prunnable-on-service-tests -Plocal-runnable-on-service-tests \
-D test=ParDoTest \
-D mdep.analyze.skip=true \
-D failIfNoTests=false \
-D forkCount=0 \
-D runnableOnServicePipelineOptions='[
  "--runner=TestSparkRunner",
  "--streaming=false",
  "--enableSparkMetricSinks=false"
]'
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam observesWindow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1616


commit 9fac4ac1abed954136bb4ed5b6e9c1471c2d3c3c
Author: Kenneth Knowles 
Date:   2016-12-14T19:26:27Z

SimpleDoFnRunner observes window if SideInputReader is nonempty




> Side input access fails in direct runner (possibly others too) when input 
> element in multiple windows
> -
>
> Key: BEAM-1149
> URL: https://issues.apache.org/jira/browse/BEAM-1149
> Project: Beam
>  Issue Type: Bug
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Blocker
> Fix For: 0.4.0-incubating
>
>
> {code:java}
>   private static class FnWithSideInputs extends DoFn {
> private final PCollectionView view;
> private FnWithSideInputs(PCollectionView view) {
>   this.view = view;
> }
> @ProcessElement
> public void processElement(ProcessContext c) {
>   c.output(c.element() + ":" + c.sideInput(view));
> }
>   }
>   @Test
>   public void testSideInputsWithMultipleWindows() {
> Pipeline p = TestPipeline.create();
> MutableDateTime mutableNow = Instant.now().toMutableDateTime();
> mutableNow.setMillisOfSecond(0);
> Instant now = mutableNow.toInstant();
> SlidingWindows windowFn =
> 
> SlidingWindows.of(Duration.standardSeconds(5)).every(Duration.standardSeconds(1));
> PCollectionView view = 
> p.apply(Create.of(1)).apply(View.asSingleton());
> PCollection res =
> p.apply(Create.timestamped(TimestampedValue.of("a", now)))
> .apply(Window.into(windowFn))
> .apply(ParDo.of(new FnWithSideInputs(view)).withSideInputs(view));
> PAssert.that(res).containsInAnyOrder("a:1");
> p.run();
>   }
> {code}
> This fails with the following exception:
> {code}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.IllegalStateException: sideInput called when main input element is 
> in multiple windows
>   at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:343)
>   at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:1)
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:176)
>   at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:112)
>   at 
> Caused by: java.lang.IllegalStateException: sideInput called when main input 
> element is in multiple windows
>   at 
> org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.sideInput(SimpleDoFnRunner.java:514)
>   at 
> org.apache.beam.sdk.transforms.ParDoTest$FnWithSideInputs.processElement(ParDoTest.java:738)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1104) WordCount: Metrics error in the DirectRunner

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749228#comment-15749228
 ] 

ASF GitHub Bot commented on BEAM-1104:
--

GitHub user bjchambers opened a pull request:

https://github.com/apache/incubator-beam/pull/1615

[BEAM-1104] Don't incorrectly log error in MetricsEnvironment

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [*] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [*] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [*] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [*] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

Using getCurrentContainer() logs an error if metrics are not supported.
This is because it acts as the common point of access for user code that
reports metrics.

It should not be used within setCurrentContainer(), because the first
container being set will have a null previous-current-container, which
will cause the error to be incorrectly logged.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bjchambers/incubator-beam fix-metrics-warning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1615


commit 125223f8feb3576d3ff5ccdffa58a5e80808286c
Author: bchambers 
Date:   2016-12-14T19:23:39Z

Don't incorrectly log error in MetricsEnvironment

Using getCurrentContainer() logs an error if metrics are not supported.
This is because it acts as the common point of access for user code that
reports metrics.

It should not be used within setCurrentContainer(), because the first
container being set will have a null previous-current-container, which
will cause the error to be incorrectly logged.




> WordCount: Metrics error in the DirectRunner
> 
>
> Key: BEAM-1104
> URL: https://issues.apache.org/jira/browse/BEAM-1104
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Daniel Halperin
>Assignee: Ben Chambers
>
> I'm following the Beam quickstart to analyze the pom.xml for the examples 
> archetype in the DirectRunner:
> Generate the project:
> {code}
> mvn archetype:generate \
>   
> -DarchetypeRepository=https://repository.apache.org/content/groups/snapshots 
> \  
>   -DarchetypeGroupId=org.apache.beam \
>   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>   -DarchetypeVersion=LATEST \
>   -DgroupId=org.example \
>   -DartifactId=word-count-beam \
>   -Dversion="0.1" \
>   -Dpackage=org.apache.beam.examples \
>   -DinteractiveMode=false
> {code}
> Count words in the pom.xml:
> {code}
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=direct/counts" -Pdirect-runner
> {code}
> The logs:
> {code}
> INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ word-count-beam ---
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.FileBasedSource 
> expandFilePattern
> INFO: Matched 1 files for pattern pom.xml
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.metrics.MetricsEnvironment 
> getCurrentContainer
> SEVERE: Unable to update metrics on the current thread. Most likely caused by 
> using metrics outside the managed work-execution thread.
> Dec 07, 2016 9:42:03 PM org.apache.beam.sdk.io.Write$Bound$1 processElement
> INFO: Initializing write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@26bbd1cf
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening writer for write operation 
> org.apache.beam.sdk.io.TextIO$TextSink$TextWriteOperation@19371061
> Dec 07, 2016 9:42:04 PM org.apache.beam.sdk.io.Write$Bound$WriteBundles 
> processElement
> INFO: Opening 

[jira] [Commented] (BEAM-1144) Spark runner fails to deserialize MicrobatchSource in cluster mode

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748328#comment-15748328
 ] 

ASF GitHub Bot commented on BEAM-1144:
--

GitHub user aviemzur opened a pull request:

https://github.com/apache/incubator-beam/pull/1613

[BEAM-1144] Spark runner fails to deserialize MicrobatchSource in cluster 
mode

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviemzur/incubator-beam cnf-deserialize-issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1613


commit 774d3b62a741ae892041a7ff40d7f78b3f6b2f3f
Author: Aviem Zur 
Date:   2016-12-14T13:19:39Z

[BEAM-1144] Spark runner fails to deserialize MicrobatchSource in cluster 
mode




> Spark runner fails to deserialize MicrobatchSource in cluster mode
> --
>
> Key: BEAM-1144
> URL: https://issues.apache.org/jira/browse/BEAM-1144
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Aviem Zur
>Assignee: Amit Sela
>
> When running in cluster mode (yarn), spark runner fails on deserialization of 
> {{MicrobatchSource}}
> After changes made in BEAM-921 spark runner fails in cluster mode with the 
> following:
> {code}
> 16/12/12 04:27:01 ERROR ApplicationMaster: User class threw exception: 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: Error during Java deserialization.
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:72)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:115)
>   at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:101)
>   at 
> com.paypal.risk.platform.aleph.example.MapOnlyExample.main(MapOnlyExample.java:38)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:559)
> Caused by: com.esotericsoftware.kryo.KryoException: Error during Java 
> deserialization.
>   at 
> com.esotericsoftware.kryo.serializers.JavaSerializer.read(JavaSerializer.java:42)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
>   at 
> org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:169)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:201)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:198)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDDRecord$.updateRecordWithData(MapWithStateRDD.scala:55)
>   at 
> org.apache.spark.streaming.rdd.MapWithStateRDD.compute(MapWithStateRDD.scala:155)
>   at 

[jira] [Commented] (BEAM-1136) Empty string value should be allowed for ValueProvider

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747477#comment-15747477
 ] 

ASF GitHub Bot commented on BEAM-1136:
--

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-beam/pull/1580


> Empty string value should be allowed for ValueProvider
> --
>
> Key: BEAM-1136
> URL: https://issues.apache.org/jira/browse/BEAM-1136
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Vikas Kedigehalli
>Assignee: Vikas Kedigehalli
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-27) Add user-ready API for interacting with timers

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15747289#comment-15747289
 ] 

ASF GitHub Bot commented on BEAM-27:


GitHub user kennknowles opened a pull request:

https://github.com/apache/incubator-beam/pull/1612

[BEAM-27] Support timer setting and receiving in SimpleDoFnRunner

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[BEAM-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).

---

These are a couple commits peeled off that should enable each runner to 
begin fulfilling the timer API.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kennknowles/incubator-beam 
SimpleDoFnRunner-timers

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1612


commit 95f954de00d20b27e127e41a0118a305e2cd8a94
Author: Kenneth Knowles 
Date:   2016-12-08T04:09:06Z

Make TimerSpec and StateSpec fields accessible

commit 19ccebe6170b24af0067c7a04483d8e9156a0ef5
Author: Kenneth Knowles 
Date:   2016-11-23T22:21:40Z

Add timer support to DoFnRunner(s)




> Add user-ready API for interacting with timers
> --
>
> Key: BEAM-27
> URL: https://issues.apache.org/jira/browse/BEAM-27
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>
> Pipeline authors will benefit from a different factorization of interaction 
> with underlying timers. The current APIs are targeted at runner implementers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >