[jira] [Commented] (BEAM-3018) Remove duplicated methods in StructuredCoder

2017-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192448#comment-16192448
 ] 

ASF GitHub Bot commented on BEAM-3018:
--

GitHub user CaoManhDat opened a pull request:

https://github.com/apache/beam/pull/3947

[BEAM-3018] Remove duplicated methods in StructuredCoder

Removed `consistentWithEquals(), structuralValue() and 
getEncodedTypeDescriptor()` in `StructuredCoder`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CaoManhDat/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3947


commit 8d8a213f0ba2f1cceec8e53bba01c8d2a55c8863
Author: Cao Manh Dat 
Date:   2017-10-05T04:28:35Z

[BEAM-3018] Remove duplicated methods in StructuredCoder




> Remove duplicated methods in StructuredCoder
> 
>
> Key: BEAM-3018
> URL: https://issues.apache.org/jira/browse/BEAM-3018
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>
> StructuredCoder has several methods that are totally the same as its parent. 
> We should remove these.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3947: [BEAM-3018] Remove duplicated methods in Structured...

2017-10-04 Thread CaoManhDat
GitHub user CaoManhDat opened a pull request:

https://github.com/apache/beam/pull/3947

[BEAM-3018] Remove duplicated methods in StructuredCoder

Removed `consistentWithEquals(), structuralValue() and 
getEncodedTypeDescriptor()` in `StructuredCoder`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CaoManhDat/beam master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3947


commit 8d8a213f0ba2f1cceec8e53bba01c8d2a55c8863
Author: Cao Manh Dat 
Date:   2017-10-05T04:28:35Z

[BEAM-3018] Remove duplicated methods in StructuredCoder




---


[jira] [Created] (BEAM-3018) Remove duplicated methods in StructuredCoder

2017-10-04 Thread Cao Manh Dat (JIRA)
Cao Manh Dat created BEAM-3018:
--

 Summary: Remove duplicated methods in StructuredCoder
 Key: BEAM-3018
 URL: https://issues.apache.org/jira/browse/BEAM-3018
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Cao Manh Dat
Assignee: Cao Manh Dat


StructuredCoder has several methods that are totally the same as its parent. We 
should remove these.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3946: Enable discovery of log_handler tests.

2017-10-04 Thread robertwb
GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3946

Enable discovery of log_handler tests.

Also fix the one remaining bug.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam yet-more-grpc-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3946


commit 566d6e19c3a6eea1ac0b0903902cb7f34ff9b82c
Author: Robert Bradshaw 
Date:   2017-10-05T00:38:43Z

Enable discovery of log_handler tests.

Also fix the one remaining bug.




---


[jira] [Updated] (BEAM-3017) DropInputs on BigQueryIO.writeTableRows() sink

2017-10-04 Thread Mingwei Samuel (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingwei Samuel updated BEAM-3017:
-
Description: 
After upgrade to version 2.1.0 there are "DropInput##" transforms on every 
output of my pipeline, outside of their corresponding transforms. They're 
getting added to my
{{BigQueryIO.writeTableRows()}}
sinks, even though looking at the source code it appears that these "DropInput" 
transforms are supposed to pull from reads that do not have any outputs, so I'm 
not sure why they're getting added to sinks.

These transforms are flooding my graph and filling it with junk, for example: 
!https://i.imgur.com/jXtn9Gr.png|thumbnail!

  was:
After upgrade to version 2.1.0 there are "DropInput##" transforms on every 
output of my pipeline, outside of their corresponding transforms. They're 
getting added to my
{{BigQueryIO.writeTableRows()}}
sinks, even though looking at the source code it appears that these "DropInput" 
transforms are supposed to pull from reads that do not have any outputs, so I'm 
not sure why they're getting added to sinks.

These transforms are flooding my graph and filling it with junk, for example: 
!https://i.imgur.com/jXtn9Gr.png!

Summary: DropInputs on BigQueryIO.writeTableRows() sink  (was: 
DropInputs on every Sink, Breaking Transform Flow)

> DropInputs on BigQueryIO.writeTableRows() sink
> --
>
> Key: BEAM-3017
> URL: https://issues.apache.org/jira/browse/BEAM-3017
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Mingwei Samuel
>Assignee: Thomas Groh
>Priority: Minor
>
> After upgrade to version 2.1.0 there are "DropInput##" transforms on every 
> output of my pipeline, outside of their corresponding transforms. They're 
> getting added to my
> {{BigQueryIO.writeTableRows()}}
> sinks, even though looking at the source code it appears that these 
> "DropInput" transforms are supposed to pull from reads that do not have any 
> outputs, so I'm not sure why they're getting added to sinks.
> These transforms are flooding my graph and filling it with junk, for example: 
> !https://i.imgur.com/jXtn9Gr.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3017) DropInputs on every Sink, Breaking Transform Flow

2017-10-04 Thread Mingwei Samuel (JIRA)
Mingwei Samuel created BEAM-3017:


 Summary: DropInputs on every Sink, Breaking Transform Flow
 Key: BEAM-3017
 URL: https://issues.apache.org/jira/browse/BEAM-3017
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: 2.1.0
Reporter: Mingwei Samuel
Assignee: Thomas Groh
Priority: Minor


After upgrade to version 2.1.0 there are "DropInput##" transforms on every 
output of my pipeline, outside of their corresponding transforms. They're 
getting added to my
{{BigQueryIO.writeTableRows()}}
sinks, even though looking at the source code it appears that these "DropInput" 
transforms are supposed to pull from reads that do not have any outputs, so I'm 
not sure why they're getting added to sinks.

These transforms are flooding my graph and filling it with junk, for example: 
!https://i.imgur.com/jXtn9Gr.png!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3945: whitelist time command in tox to prevent warning

2017-10-04 Thread aaltay
GitHub user aaltay opened a pull request:

https://github.com/apache/beam/pull/3945

whitelist time command in tox to prevent warning

R: @robertwb 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aaltay/beam time

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3945


commit 69f639ca22662014b353654bb03bfa51e313f622
Author: Ahmet Altay 
Date:   2017-10-04T23:40:18Z

whitelist time command in tox to prevent warning




---


[jira] [Commented] (BEAM-3016) BeamFnLoggingClient shutdown stability

2017-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192204#comment-16192204
 ] 

ASF GitHub Bot commented on BEAM-3016:
--

GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3944

[BEAM-3016] Fix blocking issue within run() when channel terminates while 
blocking within DirectStreamObserver.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam fn_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3944.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3944


commit d4e6c4d8d055e395d28e9d9c9ad8740c82698082
Author: Luke Cwik 
Date:   2017-10-04T23:34:54Z

[BEAM-3016] Fix blocking issue within run() when channel terminates while 
blocking within DirectStreamObserver.




> BeamFnLoggingClient shutdown stability
> --
>
> Key: BEAM-3016
> URL: https://issues.apache.org/jira/browse/BEAM-3016
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Minor
>
> The BeamFnLoggingClient has a race condition where it may not shutdown 
> appropriately if the channel has an error.
> This occurs because the stream observer created by the StreamObserverFactory 
> may block indefinitely if the DirectStreamObserver is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3944: [BEAM-3016] Fix blocking issue within run() when ch...

2017-10-04 Thread lukecwik
GitHub user lukecwik opened a pull request:

https://github.com/apache/beam/pull/3944

[BEAM-3016] Fix blocking issue within run() when channel terminates while 
blocking within DirectStreamObserver.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [x] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [x] Each commit in the pull request should have a meaningful subject 
line and body.
 - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [x] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [x] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lukecwik/incubator-beam fn_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3944.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3944


commit d4e6c4d8d055e395d28e9d9c9ad8740c82698082
Author: Luke Cwik 
Date:   2017-10-04T23:34:54Z

[BEAM-3016] Fix blocking issue within run() when channel terminates while 
blocking within DirectStreamObserver.




---


[jira] [Commented] (BEAM-2999) Split validatesrunner tests from Python postcommit

2017-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192199#comment-16192199
 ] 

ASF GitHub Bot commented on BEAM-2999:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3927


> Split validatesrunner tests from Python postcommit
> --
>
> Key: BEAM-2999
> URL: https://issues.apache.org/jira/browse/BEAM-2999
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Mark Liu
>Assignee: Mark Liu
>
> The only Python Postcommit Jenkins build includes too many tests which makes 
> the build (and test) time over 1 hour. Also it became hard to found error in 
> long console logs if build failed.
> We can separate validatesrunner tests which currently take ~20mins out from 
> the Postcommit build to a separate Jenkins branch. This will shorten the 
> total build time of Postcommit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3927: [BEAM-2999] Move ValidatesRunner test out of Python...

2017-10-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3927


---


[2/2] beam git commit: This closes #3927

2017-10-04 Thread altay
This closes #3927


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/afe8da8b
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/afe8da8b
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/afe8da8b

Branch: refs/heads/master
Commit: afe8da8bc25bb261944c838337ea5aeebd52d9cd
Parents: dee8e93 4f00899
Author: Ahmet Altay 
Authored: Wed Oct 4 16:26:00 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Oct 4 16:26:00 2017 -0700

--
 ...ommit_Python_ValidatesRunner_Dataflow.groovy | 54 +++
 sdks/python/run_postcommit.sh   | 20 --
 sdks/python/run_validatesrunner.sh  | 71 
 3 files changed, 125 insertions(+), 20 deletions(-)
--




[1/2] beam git commit: [BEAM-2999] Split ValidatesRunner test from Python Postcommit

2017-10-04 Thread altay
Repository: beam
Updated Branches:
  refs/heads/master dee8e93e7 -> afe8da8bc


[BEAM-2999] Split ValidatesRunner test from Python Postcommit


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/4f00899e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/4f00899e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/4f00899e

Branch: refs/heads/master
Commit: 4f00899e595f542344a196c9a3a472b1cf00d9b3
Parents: dee8e93
Author: Mark Liu 
Authored: Wed Sep 27 18:18:02 2017 -0700
Committer: Ahmet Altay 
Committed: Wed Oct 4 16:25:56 2017 -0700

--
 ...ommit_Python_ValidatesRunner_Dataflow.groovy | 54 +++
 sdks/python/run_postcommit.sh   | 20 --
 sdks/python/run_validatesrunner.sh  | 71 
 3 files changed, 125 insertions(+), 20 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/4f00899e/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy
--
diff --git 
a/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy
 
b/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy
new file mode 100644
index 000..06bbfb7
--- /dev/null
+++ 
b/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import common_job_properties
+
+// This job runs the suite of Python ValidatesRunner tests against the
+// Dataflow runner.
+job('beam_PostCommit_Python_ValidatesRunner_Dataflow') {
+  description('Runs Python ValidatesRunner suite on the Dataflow runner.')
+
+  // Set common parameters.
+  common_job_properties.setTopLevelMainJobProperties(delegate)
+
+  // Sets that this is a PostCommit job.
+  common_job_properties.setPostCommit(delegate, '0 3-22/6 * * *')
+
+  // Allows triggering this build against pull requests.
+  common_job_properties.enablePhraseTriggeringFromPullRequest(
+  delegate,
+  'Google Cloud Dataflow Runner Python ValidatesRunner Tests',
+  'Run Python Dataflow ValidatesRunner')
+
+  // Allow the test to only run on particular nodes
+  // TODO(BEAM-1817): Remove once the tests can run on all nodes
+  parameters {
+nodeParam('TEST_HOST') {
+  description('select test host as either beam1, 2 or 3')
+  defaultNodes(['beam3'])
+  allowedNodes(['beam1', 'beam2', 'beam3'])
+  trigger('multiSelectionDisallowed')
+  eligibility('IgnoreOfflineNodeEligibility')
+}
+  }
+
+  // Execute shell command to test Python SDK.
+  steps {
+shell('bash sdks/python/run_validatesrunner.sh')
+  }
+}

http://git-wip-us.apache.org/repos/asf/beam/blob/4f00899e/sdks/python/run_postcommit.sh
--
diff --git a/sdks/python/run_postcommit.sh b/sdks/python/run_postcommit.sh
index ddc3dc7..5e1c6b2 100755
--- a/sdks/python/run_postcommit.sh
+++ b/sdks/python/run_postcommit.sh
@@ -66,26 +66,6 @@ python setup.py sdist
 
 SDK_LOCATION=$(find dist/apache-beam-*.tar.gz)
 
-# Install test dependencies for ValidatesRunner tests.
-echo "pyhamcrest" > postcommit_requirements.txt
-echo "mock" >> postcommit_requirements.txt
-
-# Run ValidatesRunner tests on Google Cloud Dataflow service
-echo ">>> RUNNING DATAFLOW RUNNER VALIDATESRUNNER TESTS"
-python setup.py nosetests \
-  --attr ValidatesRunner \
-  --nocapture \
-  --processes=4 \
-  --process-timeout=900 \
-  --test-pipeline-options=" \
---runner=TestDataflowRunner \
---project=$PROJECT \
---staging_location=$GCS_LOCATION/staging-validatesrunner-test \
---temp_location=$GCS_LOCATION/temp-validatesrunner-test \
---sdk_location=$SDK_LOCATION \
---requirements_file=postcommit_requirements.txt \
---num_workers=1"
-
 # Run integration tests on the Google Cloud Dataflow service
 # and validate that jobs finish successfully.
 echo ">>> RUNNING TEST DATAFLOW RUNNER it 

[jira] [Created] (BEAM-3016) BeamFnLoggingClient shutdown stability

2017-10-04 Thread Luke Cwik (JIRA)
Luke Cwik created BEAM-3016:
---

 Summary: BeamFnLoggingClient shutdown stability
 Key: BEAM-3016
 URL: https://issues.apache.org/jira/browse/BEAM-3016
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-harness
Reporter: Luke Cwik
Assignee: Luke Cwik
Priority: Minor


The BeamFnLoggingClient has a race condition where it may not shutdown 
appropriately if the channel has an error.

This occurs because the stream observer created by the StreamObserverFactory 
may block indefinitely if the DirectStreamObserver is used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3943: Adding tracking for bytes and msecs spent while rea...

2017-10-04 Thread pabloem
GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/3943

Adding tracking for bytes and msecs spent while reading from side inputs

Testing changes to track msecs and bytes spent while reading from side 
inputs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam sicounters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3943.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3943


commit d2ca52a83815c31b329d1e1409344f81482d45c5
Author: Pablo 
Date:   2017-10-04T22:55:28Z

Adding tracking for bytes and msecs spent while reading from side inputs




---


[jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema

2017-10-04 Thread Eugene Kirpichov (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192129#comment-16192129
 ] 

Eugene Kirpichov commented on BEAM-2993:


OK, thanks for the explanations. A couple more questions:

- Does AvroIO.write().to(DynamicDestinations) work for you? It seems like what 
you have is a very specialized use case (I've never seen nor imagined anything 
like it), so if an existing solution does the job, then it might be best to 
just use that rather than develop a new feature guided only by a single very 
exotic use case.
- Suppose a schemaless AvroIO.write() was implemented, and suppose you give it 
a PCollection that happens to contain records with many 
different schemas. What should it do? Should it group them by schema? Should it 
simply fail? Should it use the schema of a (non-deterministically chosen) 
"first" record in each generated file and hope that other records have the same 
schema?
- Would it make things easier, if instead of PCollection you 
operated in terms of PCollection where SchemaRefAndRecord 
is your custom type { String schemaURI; GenericRecord record; }, with a custom 
coder for it that fetches the schema over the network from a schema registry by 
URI or something? And then when writing to AvroIO, you'd go down the path of 
DynamicDestinations and group by schemaURI before writing (i.e. use it as a 
destination type); and it would be up to your code to ensure that the schema 
URIs are unique.

> AvroIO.write without specifying a schema
> 
>
> Key: BEAM-2993
> URL: https://issues.apache.org/jira/browse/BEAM-2993
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be 
> able to write to avro files using {{AvroIO}} without specifying a schema at 
> build time. Consider the following use case: a user has a 
> {{PCollection}}  but the schema is only known while running 
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the 
> schema is already available in {{GenericRecord}}. We should be able to call 
> {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (BEAM-3015) Need logging from C threads in Cython

2017-10-04 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-3015:
---

 Summary: Need logging from C threads in Cython
 Key: BEAM-3015
 URL: https://issues.apache.org/jira/browse/BEAM-3015
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3013) The Python worker should report lulls

2017-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192097#comment-16192097
 ] 

ASF GitHub Bot commented on BEAM-3013:
--

Github user pabloem closed the pull request at:

https://github.com/apache/beam/pull/3936


> The Python worker should report lulls
> -
>
> Key: BEAM-3013
> URL: https://issues.apache.org/jira/browse/BEAM-3013
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>
> Whenever too much time has been spent on the same state (e.g. > 5 minutes), 
> the worker should report it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3936: [BEAM-3013] Prototyping lull-tracking for Python

2017-10-04 Thread pabloem
Github user pabloem closed the pull request at:

https://github.com/apache/beam/pull/3936


---


[GitHub] beam pull request #3942: Enabling phrase triggering of Python PreCommit

2017-10-04 Thread pabloem
GitHub user pabloem opened a pull request:

https://github.com/apache/beam/pull/3942

Enabling phrase triggering of Python PreCommit

r: @jasonkuster 
cc: @kennknowles 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pabloem/incubator-beam patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3942


commit a4cd96cf4c8663fa6eb8778e5079dbd8c2a05151
Author: Pablo 
Date:   2017-10-04T21:44:58Z

Enabling phrase triggering of Python PreCommit




---


[jira] [Commented] (BEAM-2596) Break up Jenkins PreCommit into individual steps.

2017-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192053#comment-16192053
 ] 

ASF GitHub Bot commented on BEAM-2596:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3935


> Break up Jenkins PreCommit into individual steps.
> -
>
> Key: BEAM-2596
> URL: https://issues.apache.org/jira/browse/BEAM-2596
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, testing
>Reporter: Jason Kuster
>Assignee: Jason Kuster
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3935: [BEAM-2596] Split Java and Python precommit jobs

2017-10-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3935


---


[1/2] beam git commit: Split Java and Python precommit jobs

2017-10-04 Thread kenn
Repository: beam
Updated Branches:
  refs/heads/master 929a23e29 -> dee8e93e7


Split Java and Python precommit jobs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/25790a4f
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/25790a4f
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/25790a4f

Branch: refs/heads/master
Commit: 25790a4fdbe59ac5b1d5e19cce8ec3620be08183
Parents: 9379ca2
Author: Kenneth Knowles 
Authored: Tue Oct 3 10:06:27 2017 -0700
Committer: Kenneth Knowles 
Committed: Wed Oct 4 14:08:07 2017 -0700

--
 .../job_beam_PreCommit_Java_MavenInstall.groovy | 21 ++--
 ...ob_beam_PreCommit_Python_MavenInstall.groovy | 56 
 2 files changed, 73 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/25790a4f/.test-infra/jenkins/job_beam_PreCommit_Java_MavenInstall.groovy
--
diff --git a/.test-infra/jenkins/job_beam_PreCommit_Java_MavenInstall.groovy 
b/.test-infra/jenkins/job_beam_PreCommit_Java_MavenInstall.groovy
index f4ebcaf..dad3726 100644
--- a/.test-infra/jenkins/job_beam_PreCommit_Java_MavenInstall.groovy
+++ b/.test-infra/jenkins/job_beam_PreCommit_Java_MavenInstall.groovy
@@ -38,8 +38,21 @@ mavenJob('beam_PreCommit_Java_MavenInstall') {
   common_job_properties.setMavenConfig(delegate)
 
   // Sets that this is a PreCommit job.
-  common_job_properties.setPreCommit(delegate, 'Maven clean install')
-
-  // Maven goals for this job.
-  goals('-B -e 
-Prelease,include-runners,jenkins-precommit,direct-runner,dataflow-runner,spark-runner,flink-runner,apex-runner
 -DrepoToken=$COVERALLS_REPO_TOKEN -DpullRequest=$ghprbPullId 
help:effective-settings clean install coveralls:report')
+  common_job_properties.setPreCommit(delegate, 'mvn clean install -pl 
sdks/java/core -am -amd')
+
+  // Maven goals for this job: The Java SDK, its dependencies, and things that 
depend on it.
+  goals('''\
+--batch-mode \
+--errors \
+--activate-profiles 
release,jenkins-precommit,direct-runner,dataflow-runner,spark-runner,flink-runner,apex-runner
 \
+--projects sdks/java/core \
+--also-make \
+--also-make-dependents \
+-D repoToken=$COVERALLS_REPO_TOKEN \
+-D pullRequest=$ghprbPullId \
+help:effective-settings \
+clean \
+install \
+coveralls:report \
+  ''')
 }

http://git-wip-us.apache.org/repos/asf/beam/blob/25790a4f/.test-infra/jenkins/job_beam_PreCommit_Python_MavenInstall.groovy
--
diff --git a/.test-infra/jenkins/job_beam_PreCommit_Python_MavenInstall.groovy 
b/.test-infra/jenkins/job_beam_PreCommit_Python_MavenInstall.groovy
new file mode 100644
index 000..19a4b21
--- /dev/null
+++ b/.test-infra/jenkins/job_beam_PreCommit_Python_MavenInstall.groovy
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import common_job_properties
+
+// This is the Java precommit which runs a maven install, and the current set
+// of precommit tests.
+mavenJob('beam_PreCommit_Python_MavenInstall') {
+  description('Runs an install of the current GitHub Pull Request.')
+
+  previousNames('beam_PreCommit_MavenVerify')
+
+  // Execute concurrent builds if necessary.
+  concurrentBuild()
+
+  // Set common parameters.
+  common_job_properties.setTopLevelMainJobProperties(
+delegate,
+'master',
+150)
+
+  // Set Maven parameters.
+  common_job_properties.setMavenConfig(delegate)
+
+  // Sets that this is a PreCommit job.
+  common_job_properties.setPreCommit(delegate, 'mvn clean install -pl 
sdks/python -am -amd')
+
+  // Maven goals for this job: The Python SDK, its dependencies, and things 
that depend on it.
+  goals('''\
+--batch-mode \
+--errors \
+--activate-profiles 
release,jenkins-precommit,direct-runner,dataflow-runner,spark-runner,flink-runner,apex-runner
 \
+--projects sdks/python \
+--also-make \
+--also-make-dependents \
+

[2/2] beam git commit: This closes #3935: [BEAM-2596] Split Java and Python precommit jobs

2017-10-04 Thread kenn
This closes #3935: [BEAM-2596] Split Java and Python precommit jobs


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/dee8e93e
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/dee8e93e
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/dee8e93e

Branch: refs/heads/master
Commit: dee8e93e771c90fe6d9207fe65b01db8c7ea41a9
Parents: 929a23e 25790a4
Author: Kenneth Knowles 
Authored: Wed Oct 4 14:16:45 2017 -0700
Committer: Kenneth Knowles 
Committed: Wed Oct 4 14:16:45 2017 -0700

--
 .../job_beam_PreCommit_Java_MavenInstall.groovy | 21 ++--
 ...ob_beam_PreCommit_Python_MavenInstall.groovy | 56 
 2 files changed, 73 insertions(+), 4 deletions(-)
--




[GitHub] beam pull request #3940: Use imports from grpc.

2017-10-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/3940


---


[1/2] beam git commit: Use imports from grpc.

2017-10-04 Thread robertwb
Repository: beam
Updated Branches:
  refs/heads/master 9379ca289 -> 929a23e29


Use imports from grpc.


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/05765385
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/05765385
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/05765385

Branch: refs/heads/master
Commit: 0576538561d2b13a920accaf90efc4ce2f2f50f0
Parents: 9379ca2
Author: Robert Bradshaw 
Authored: Wed Oct 4 13:39:00 2017 -0700
Committer: Robert Bradshaw 
Committed: Wed Oct 4 13:39:00 2017 -0700

--
 sdks/python/apache_beam/runners/worker/log_handler.py  | 3 ++-
 sdks/python/apache_beam/runners/worker/log_handler_test.py | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/beam/blob/05765385/sdks/python/apache_beam/runners/worker/log_handler.py
--
diff --git a/sdks/python/apache_beam/runners/worker/log_handler.py 
b/sdks/python/apache_beam/runners/worker/log_handler.py
index f878943..8691184 100644
--- a/sdks/python/apache_beam/runners/worker/log_handler.py
+++ b/sdks/python/apache_beam/runners/worker/log_handler.py
@@ -48,7 +48,8 @@ class FnApiLogRecordHandler(logging.Handler):
   def __init__(self, log_service_descriptor):
 super(FnApiLogRecordHandler, self).__init__()
 self._log_channel = grpc.insecure_channel(log_service_descriptor.url)
-self._logging_stub = beam_fn_api_pb2.BeamFnLoggingStub(self._log_channel)
+self._logging_stub = beam_fn_api_pb2_grpc.BeamFnLoggingStub(
+self._log_channel)
 self._log_entry_queue = queue.Queue()
 
 log_control_messages = 
self._logging_stub.Logging(self._write_log_entries())

http://git-wip-us.apache.org/repos/asf/beam/blob/05765385/sdks/python/apache_beam/runners/worker/log_handler_test.py
--
diff --git a/sdks/python/apache_beam/runners/worker/log_handler_test.py 
b/sdks/python/apache_beam/runners/worker/log_handler_test.py
index 4903877..9814324 100644
--- a/sdks/python/apache_beam/runners/worker/log_handler_test.py
+++ b/sdks/python/apache_beam/runners/worker/log_handler_test.py
@@ -45,7 +45,7 @@ class FnApiLogRecordHandlerTest(unittest.TestCase):
   def setUp(self):
 self.test_logging_service = BeamFnLoggingServicer()
 self.server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
-beam_fn_api_pb2.add_BeamFnLoggingServicer_to_server(
+beam_fn_api_pb2_grpc.add_BeamFnLoggingServicer_to_server(
 self.test_logging_service, self.server)
 self.test_port = self.server.add_insecure_port('[::]:0')
 self.server.start()



[2/2] beam git commit: Closes #3940

2017-10-04 Thread robertwb
Closes #3940


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/929a23e2
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/929a23e2
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/929a23e2

Branch: refs/heads/master
Commit: 929a23e29772a5ad3a3fe2232333b482ba7d21a0
Parents: 9379ca2 0576538
Author: Robert Bradshaw 
Authored: Wed Oct 4 14:13:10 2017 -0700
Committer: Robert Bradshaw 
Committed: Wed Oct 4 14:13:10 2017 -0700

--
 sdks/python/apache_beam/runners/worker/log_handler.py  | 3 ++-
 sdks/python/apache_beam/runners/worker/log_handler_test.py | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)
--




[GitHub] beam pull request #3941: Align names with those produced by the runner harne...

2017-10-04 Thread robertwb
GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3941

Align names with those produced by the runner harness.

These will be unused once the runner harness produces the correct
transform payloads.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam runner-harness-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3941


commit 29fea883637d809ec75d4dc65f5faeed8eb04210
Author: Robert Bradshaw 
Date:   2017-10-04T20:57:01Z

Align names with those produced by the runner harness.

These will be unused once the runner harness produces the correct
transform payloads.




---


[GitHub] beam pull request #3940: Use imports from grpc.

2017-10-04 Thread robertwb
GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3940

Use imports from grpc.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam more-grpc-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3940


commit 0576538561d2b13a920accaf90efc4ce2f2f50f0
Author: Robert Bradshaw 
Date:   2017-10-04T20:39:00Z

Use imports from grpc.




---


[jira] [Assigned] (BEAM-2852) Add support for Kafka as source/sink on Nexmark

2017-10-04 Thread Kai Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Jiang reassigned BEAM-2852:
---

Assignee: Kai Jiang

> Add support for Kafka as source/sink on Nexmark
> ---
>
> Key: BEAM-2852
> URL: https://issues.apache.org/jira/browse/BEAM-2852
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Ismaël Mejía
>Assignee: Kai Jiang
>Priority: Minor
>  Labels: newbie, starter
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4942

2017-10-04 Thread Apache Jenkins Server
See 




[jira] [Updated] (BEAM-1872) implement Reshuffle transform

2017-10-04 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-1872:
--
Component/s: sdk-java-core

> implement Reshuffle transform
> -
>
> Key: BEAM-1872
> URL: https://issues.apache.org/jira/browse/BEAM-1872
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Ahmet Altay
>  Labels: sdk-consistency
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (BEAM-1872) implement Reshuffle transform in python, make it experimental in Java

2017-10-04 Thread Ahmet Altay (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-1872:
--
Summary: implement Reshuffle transform in python, make it experimental in 
Java  (was: implement Reshuffle transform)

> implement Reshuffle transform in python, make it experimental in Java
> -
>
> Key: BEAM-1872
> URL: https://issues.apache.org/jira/browse/BEAM-1872
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Ahmet Altay
>  Labels: sdk-consistency
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-1872) implement Reshuffle transform

2017-10-04 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191854#comment-16191854
 ] 

Ahmet Altay commented on BEAM-1872:
---

cc: [~kirpichov]

Reshuffle is Java is still being maintained and getting new features (e.g. 
{{ViaRandomKey}}) 
(https://github.com/apache/beam/blob/9379ca289a00cae169075728e6230a1d4a317659/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java)

It makes sense to make {{Reshuffle}} experimental in both SDKs and implement it 
for python as well.

> implement Reshuffle transform
> -
>
> Key: BEAM-1872
> URL: https://issues.apache.org/jira/browse/BEAM-1872
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>  Labels: sdk-consistency
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2822) Add support for progress reporting in fn API

2017-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191575#comment-16191575
 ] 

ASF GitHub Bot commented on BEAM-2822:
--

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3939

[BEAM-2822] Add progress metric reporting to Python SDK.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam progress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3939


commit 2759c4198375393f7db436e8303e87c228795c05
Author: Robert Bradshaw 
Date:   2017-10-03T00:20:38Z

Add progress metrics to Python SDK.

commit fe1d3c9b9b4d981594f526d632592cbcb11c42f5
Author: Robert Bradshaw 
Date:   2017-10-04T06:38:02Z

lint




> Add support for progress reporting in fn API
> 
>
> Key: BEAM-2822
> URL: https://issues.apache.org/jira/browse/BEAM-2822
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Vikas Kedigehalli
>Assignee: Robert Bradshaw
>Priority: Minor
>  Labels: portability
>
> https://s.apache.org/beam-fn-api-progress-reporting
> Note that the ULR reference implementation, when ready, should be useful for 
> every runner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] beam pull request #3939: [BEAM-2822] Add progress metric reporting to Python...

2017-10-04 Thread robertwb
GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/3939

[BEAM-2822] Add progress metric reporting to Python SDK.

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam progress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/3939.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3939


commit 2759c4198375393f7db436e8303e87c228795c05
Author: Robert Bradshaw 
Date:   2017-10-03T00:20:38Z

Add progress metrics to Python SDK.

commit fe1d3c9b9b4d981594f526d632592cbcb11c42f5
Author: Robert Bradshaw 
Date:   2017-10-04T06:38:02Z

lint




---


[jira] [Created] (BEAM-3014) DoFnRunner _reraise_augmented fails with non-ascii exception messages

2017-10-04 Thread Ahmet Altay (JIRA)
Ahmet Altay created BEAM-3014:
-

 Summary: DoFnRunner _reraise_augmented fails with non-ascii 
exception messages
 Key: BEAM-3014
 URL: https://issues.apache.org/jira/browse/BEAM-3014
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ahmet Altay
Priority: Minor


{{_reraise_augmented}} fails to decode non-ascii bytes when re-raising 
exceptions and hides the original exception.

Full stack trace:
(6d7ae3386f271a8f): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 582, in do_work
work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
line 166, in execute
op.start()
  File "dataflow_worker/native_operations.py", line 38, in 
dataflow_worker.native_operations.NativeReadOperation.start 
(dataflow_worker/native_operations.c:3175)
def start(self):
  File "dataflow_worker/native_operations.py", line 39, in 
dataflow_worker.native_operations.NativeReadOperation.start 
(dataflow_worker/native_operations.c:3079)
with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in 
dataflow_worker.native_operations.NativeReadOperation.start 
(dataflow_worker/native_operations.c:2994)
with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in 
dataflow_worker.native_operations.NativeReadOperation.start 
(dataflow_worker/native_operations.c:2938)
self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in 
apache_beam.runners.worker.operations.Operation.output 
(apache_beam/runners/worker/operations.c:5783)
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in 
apache_beam.runners.worker.operations.ConsumerSet.receive 
(apache_beam/runners/worker/operations.c:3622)
cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in 
apache_beam.runners.worker.operations.DoOperation.process 
(apache_beam/runners/worker/operations.c:11089)
with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in 
apache_beam.runners.worker.operations.DoOperation.process 
(apache_beam/runners/worker/operations.c:11043)
self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in 
apache_beam.runners.common.DoFnRunner.receive 
(apache_beam/runners/common.c:10156)
self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in 
apache_beam.runners.common.DoFnRunner.process 
(apache_beam/runners/common.c:10458)
self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 429, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented 
(apache_beam/runners/common.c:11606)
+ step_annotation)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1521: 
ordinal not in range(128)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to stable : beam_PostCommit_Java_ValidatesRunner_Dataflow #4098

2017-10-04 Thread Apache Jenkins Server
See 




[jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema

2017-10-04 Thread Ryan Skraba (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191085#comment-16191085
 ] 

Ryan Skraba commented on BEAM-2993:
---

Hello -- I'm chiming in to help clarify our use case, which is a bit 
specialized.  However, if it's useful for us, it's potentially useful to others!

As part of our work using Beam, we help users assemble pipelines to run using 
configured "components".  These are eventually translated to PTransforms, of 
course, acting on PCollections -- nothing surprising!   We've picked Avro 
IndexedRecords (not GenericRecords, but that's a detail for the moment) as the 
common currency between the PTransforms.  This works well, especially if you 
know every schema on every collection at design-time, when you're building your 
pipeline.   

[~jkff] is correct that *if* the schema is known  *and* you are using 
{{AvroCoder}}, you already have the schema in your hands when you build the 
{{AvroIO.write}} and all is well.

We have some advanced functionality, however, where we deduce the schema at 
runtime -- either at the start of the pipeline (such as reading from a JDBC 
table and converting the row + table metadata into a consistent IndexedRecord) 
but also in the middle of a pipeline (we can infer a schema after some 
user-defined processing).  In brief, we can't directly use AvroCoder in this 
case, but we can write our own {{Coder}} that takes care of 
sharing the schema between interested nodes when necessary (not with every 
record).

In this case, we've managed to create a {{PCollection}} while 
designing the Pipeline, using our Coder that doesn't require the schema, but we 
still can't attach it to the {{AvroIO.write}}...

Note that "sharing the schema between interested nodes" in our custom coder 
introduces a distributed state between nodes, which is not the ideal for 
parallelization -- in this case, we've measured the cost to be acceptable since 
it only occurs the first time a node tries to write or read an avro-encoded 
record from the collection.

That's a long detour to explain why {{AvroIO.write}} without schema would be 
interesting to us, but I hope you find it useful.   Our technique for sharing 
the schema as distributed state in the Coder is a much larger view but I'm very 
sure we'd be interested in contributing!

> AvroIO.write without specifying a schema
> 
>
> Key: BEAM-2993
> URL: https://issues.apache.org/jira/browse/BEAM-2993
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be 
> able to write to avro files using {{AvroIO}} without specifying a schema at 
> build time. Consider the following use case: a user has a 
> {{PCollection}}  but the schema is only known while running 
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the 
> schema is already available in {{GenericRecord}}. We should be able to call 
> {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema

2017-10-04 Thread Etienne Chauchot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190947#comment-16190947
 ] 

Etienne Chauchot commented on BEAM-2993:


Your questions are rightful. In more detail, we use the lazy avro coder I was 
talking about. It is responsible for determining the schema at runtime and 
delegate to the AvroCoder. The thing is that it also stores the obtained schema 
to a network registry service. We find it a bad idea to call the network 
registry before writing just to get back the schema while we can avoid passing 
it to the write transform. But I know, it entails calling once again (in 
addition to the call in lazy avro coder)  {{GenericRecord.getSchema()}}. 
[~ryanskraba] feel free to comment if you have anything to add.

> AvroIO.write without specifying a schema
> 
>
> Key: BEAM-2993
> URL: https://issues.apache.org/jira/browse/BEAM-2993
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-extensions
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be 
> able to write to avro files using {{AvroIO}} without specifying a schema at 
> build time. Consider the following use case: a user has a 
> {{PCollection}}  but the schema is only known while running 
> the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the 
> schema is already available in {{GenericRecord}}. We should be able to call 
> {{AvroIO.writeGenericRecords()}} with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build became unstable: beam_PostCommit_Java_ValidatesRunner_Dataflow #4097

2017-10-04 Thread Apache Jenkins Server
See 




Jenkins build is unstable: beam_PostCommit_Java_MavenInstall #4940

2017-10-04 Thread Apache Jenkins Server
See