[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=340891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340891
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 09/Nov/19 04:38
Start Date: 09/Nov/19 04:38
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10051: 
[WIP/BEAM-7961] Add tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#discussion_r344428749
 
 

 ##
 File path: sdks/python/test-suites/portable/py2/build.gradle
 ##
 @@ -87,7 +87,7 @@ task crossLanguagePythonJavaDirect {
 ]
 exec {
   executable 'sh'
-  args '-c', ". ${envdir}/bin/activate && cd ${pythonRootDir} && pip 
install -e .[test] && python -m apache_beam.transforms.external_test 
${options.join(' ')}"
+  args '-c', ". ${envdir}/bin/activate && cd ${pythonRootDir} && pip 
install -e .[test] && python -m apache_beam.transforms.external_java 
${options.join(' ')}"
 
 Review comment:
   Please test this also in at least 1 Python 3 suite. If we have to pick one, 
pick the lowest version (3.5).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340891)
Time Spent: 20m  (was: 10m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=340877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340877
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 09/Nov/19 03:10
Start Date: 09/Nov/19 03:10
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10004: [BEAM-8442] 
Unify bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004#issuecomment-552060264
 
 
   Thanks for your review @ibzib !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340877)
Time Spent: 3h 20m  (was: 3h 10m)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8553) Add a more detailed cross-language transforms roadmap

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8553?focusedWorklogId=340868=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340868
 ]

ASF GitHub Bot logged work on BEAM-8553:


Author: ASF GitHub Bot
Created on: 09/Nov/19 02:19
Start Date: 09/Nov/19 02:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10054: [BEAM-8553] 
Updates cross-language transforms roadmap
URL: https://github.com/apache/beam/pull/10054#issuecomment-552056290
 
 
   R: @tweise 
   
   CC: @chadrik @robertwb @mxm @ihji 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340868)
Time Spent: 20m  (was: 10m)

> Add a more detailed cross-language transforms roadmap
> -
>
> Key: BEAM-8553
> URL: https://issues.apache.org/jira/browse/BEAM-8553
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have a basic description at following location but this has to be expanded 
> to detail currently completed tasks and ongoing and future efforts related to 
> Beam and runners (Flink and Dataflow primarily currently).
> [https://beam.apache.org/roadmap/connectors-multi-sdk/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8553) Add a more detailed cross-language transforms roadmap

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8553?focusedWorklogId=340864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340864
 ]

ASF GitHub Bot logged work on BEAM-8553:


Author: ASF GitHub Bot
Created on: 09/Nov/19 02:18
Start Date: 09/Nov/19 02:18
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10054: 
[BEAM-8553] Updates cross-language transforms roadmap
URL: https://github.com/apache/beam/pull/10054
 
 
   Updates the roadmap to reflect the current status.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8603) Add Python SqlTransform example script

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=340866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340866
 ]

ASF GitHub Bot logged work on BEAM-8603:


Author: ASF GitHub Bot
Created on: 09/Nov/19 02:18
Start Date: 09/Nov/19 02:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10055: [BEAM-8603] Add 
Python SqlTransform example script
URL: https://github.com/apache/beam/pull/10055#issuecomment-552056273
 
 
   CC: @robinyqiu 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340866)
Time Spent: 20m  (was: 10m)

> Add Python SqlTransform example script
> --
>
> Key: BEAM-8603
> URL: https://issues.apache.org/jira/browse/BEAM-8603
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8603) Add Python SqlTransform example script

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=340865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340865
 ]

ASF GitHub Bot logged work on BEAM-8603:


Author: ASF GitHub Bot
Created on: 09/Nov/19 02:18
Start Date: 09/Nov/19 02:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10055: 
[BEAM-8603] Add Python SqlTransform example script
URL: https://github.com/apache/beam/pull/10055
 
 
   To run:
   
   1. Start a Flink cluster with master url `localhost:8081` (e.g. by 
downloading a [flink 
release](https://flink.apache.org/downloads.html#apache-flink-182) and running 
`./bin/start-cluster.sh`)
   2. Start a job server pointing to this cluster: 
   ```
   ./gradlew :runners:flink:1.8:job-server:runShadow 
-PflinkMasterUrl=localhost:8081`
   ```
   3. Make sure the expansion service jar and java, python containers have been 
built:
   ```
   ./gradlew :sdks:java:testing:expansion-service:buildTestExpansionServiceJar 
:sdks:java:container:docker :sdks:python:container:py2:docker
   ```
   4. Run the example script:
   ```
   python -m apache_beam.examples.wordcount_xlang_sql \
   --expansion_service_jar 
sdks/java/testing/expansion-service/build/libs/beam-sdks-java-testing-expansion-service-testExpansionService-2.18.0-SNAPSHOT.jar
 \
   
--experiment='jar_packages=./sdks/java/extensions/sql/build/libs/beam-sdks-java-extensions-sql-2.18.0-SNAPSHOT.jar'
 \
   --runner PortableRunner \
   --job_endpoint localhost:8099 \
   --input 'gs://dataflow-samples/shakespeare/kinglear.txt' \
   --output gs://your/bucket/output
   ```
   5. Inspect the output: `gsutil cat 'gs://your/bucket/output/*'`
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Created] (BEAM-8603) Add Python SqlTransform example script

2019-11-08 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-8603:
---

 Summary: Add Python SqlTransform example script
 Key: BEAM-8603
 URL: https://issues.apache.org/jira/browse/BEAM-8603
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8602?focusedWorklogId=340863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340863
 ]

ASF GitHub Bot logged work on BEAM-8602:


Author: ASF GitHub Bot
Created on: 09/Nov/19 02:07
Start Date: 09/Nov/19 02:07
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10053: [BEAM-8602] 
Always use shadow configuration for direct runner dependencies
URL: https://github.com/apache/beam/pull/10053#issuecomment-552055277
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340863)
Time Spent: 20m  (was: 10m)

> Always use shadow configuration for direct runner dependencies
> --
>
> Key: BEAM-8602
> URL: https://issues.apache.org/jira/browse/BEAM-8602
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8602?focusedWorklogId=340862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340862
 ]

ASF GitHub Bot logged work on BEAM-8602:


Author: ASF GitHub Bot
Created on: 09/Nov/19 02:07
Start Date: 09/Nov/19 02:07
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10053: 
[BEAM-8602] Always use shadow configuration for direct runner dependencies
URL: https://github.com/apache/beam/pull/10053
 
 
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2019-11-08 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970674#comment-16970674
 ] 

Brian Hulette commented on BEAM-8602:
-

cc: [~lcwik]

> Always use shadow configuration for direct runner dependencies
> --
>
> Key: BEAM-8602
> URL: https://issues.apache.org/jira/browse/BEAM-8602
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2019-11-08 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8602:

Component/s: dsl-sql

> Always use shadow configuration for direct runner dependencies
> --
>
> Key: BEAM-8602
> URL: https://issues.apache.org/jira/browse/BEAM-8602
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql, sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8602) Always use shadow configuration for direct runner dependencies

2019-11-08 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-8602:
---

 Summary: Always use shadow configuration for direct runner 
dependencies
 Key: BEAM-8602
 URL: https://issues.apache.org/jira/browse/BEAM-8602
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=340844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340844
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 09/Nov/19 01:08
Start Date: 09/Nov/19 01:08
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10051: [BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051
 
 
   WIP. Do not review. Do not merge.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=340843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340843
 ]

ASF GitHub Bot logged work on BEAM-8592:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:56
Start Date: 09/Nov/19 00:56
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10021: [BEAM-8592] 
Adjusting ZetaSQL table resolution to standard
URL: https://github.com/apache/beam/pull/10021#issuecomment-552047295
 
 
   Let me ask in another way: do we need unit tests for table resolution? If 
so, can you add some? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340843)
Time Spent: 40m  (was: 0.5h)

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340842
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:51
Start Date: 09/Nov/19 00:51
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10050: [BEAM-8575] Add 
streaming test case for multi-triggered GBK as side input
URL: https://github.com/apache/beam/pull/10050#issuecomment-552046479
 
 
   R: @robertwb 
   
   Please take a look.  Everything makes sense except in the second firing of 
side input, there is an extra empty list. Line #369. I am not sure if that is 
expected behavior for streaming case. 
   
   Thank you! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340842)
Time Spent: 2h  (was: 1h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340839
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:38
Start Date: 09/Nov/19 00:38
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10050: [BEAM-8575] 
Add streaming test case for multi-triggered GBK as side input
URL: https://github.com/apache/beam/pull/10050
 
 
   Add python streaming test does following:  
   1) GBK result as a side input.
   2) multiple triggering for the side input.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [X] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7765?focusedWorklogId=340837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340837
 ]

ASF GitHub Bot logged work on BEAM-7765:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:33
Start Date: 09/Nov/19 00:33
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9685: [BEAM-7765] - Add test 
for snippet accessing_valueprovider_info_after_run
URL: https://github.com/apache/beam/pull/9685#issuecomment-552043480
 
 
   @angulartist are you still working on this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340837)
Time Spent: 2h 20m  (was: 2h 10m)

> Add test for snippet accessing_valueprovider_info_after_run
> ---
>
> Key: BEAM-7765
> URL: https://issues.apache.org/jira/browse/BEAM-7765
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: John Patoch
>Priority: Major
>  Labels: easy
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This snippet needs a unit test.
> It has bugs. For example:
> - apache_beam.utils.value_provider doesn't exist
> - beam.combiners.Sum doesn't exist
> - unused import of: WriteToText
> cc: [~pabloem]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8003) Remove all mentions of PKB on Confluence / website docs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8003?focusedWorklogId=340836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340836
 ]

ASF GitHub Bot logged work on BEAM-8003:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:32
Start Date: 09/Nov/19 00:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9626: [BEAM-8003] 
pyjobs init commit
URL: https://github.com/apache/beam/pull/9626
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340836)
Time Spent: 4h 40m  (was: 4.5h)

> Remove all mentions of PKB on Confluence / website docs
> ---
>
> Key: BEAM-8003
> URL: https://issues.apache.org/jira/browse/BEAM-8003
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8003) Remove all mentions of PKB on Confluence / website docs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8003?focusedWorklogId=340835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340835
 ]

ASF GitHub Bot logged work on BEAM-8003:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:32
Start Date: 09/Nov/19 00:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9626: [BEAM-8003] pyjobs 
init commit
URL: https://github.com/apache/beam/pull/9626#issuecomment-552043391
 
 
   Looks unrelated to the Beam repo as pointed out earlier. Closing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340835)
Time Spent: 4.5h  (was: 4h 20m)

> Remove all mentions of PKB on Confluence / website docs
> ---
>
> Key: BEAM-8003
> URL: https://issues.apache.org/jira/browse/BEAM-8003
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8598) TestStream broken across multiple stages in Flink

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8598?focusedWorklogId=340833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340833
 ]

ASF GitHub Bot logged work on BEAM-8598:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:31
Start Date: 09/Nov/19 00:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10049: [BEAM-8598] 
Test triggering BEAM-8598 on FlinkRunner.
URL: https://github.com/apache/beam/pull/10049
 
 
   R: @mxm 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=340830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340830
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:12
Start Date: 09/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10004: [BEAM-8442] Unify 
bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004#issuecomment-552039221
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340830)
Time Spent: 3h 10m  (was: 3h)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340828
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:02
Start Date: 09/Nov/19 00:02
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9892: [BEAM-8427] [SQL] 
buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#issuecomment-552037374
 
 
   Waiting on #10031 to get merged before merging this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340828)
Time Spent: 6h 40m  (was: 6.5h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340827
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:01
Start Date: 09/Nov/19 00:01
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] 
[SQL] buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#discussion_r344410953
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
 ##
 @@ -351,6 +351,9 @@ private void writeValue(JsonGenerator gen, FieldType type, 
Object value) throws
 case ROW:
   writeRow((Row) value, type.getRowSchema(), gen);
   break;
+case LOGICAL_TYPE:
+  writeValue(gen, type.getLogicalType().getBaseType(), value);
+  break;
 
 Review comment:
   Added `makeLogicalTypeTestCase`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340827)
Time Spent: 6.5h  (was: 6h 20m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340826
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 09/Nov/19 00:01
Start Date: 09/Nov/19 00:01
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] 
[SQL] buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#discussion_r344410918
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableTest.java
 ##
 @@ -99,6 +100,35 @@ public void testDocumentToRowConverter() {
 pipeline.run().waitUntilFinish();
   }
 
+  @Test
+  public void testRowToDocumentConverter() {
+PCollection output =
+pipeline
+.apply(
+"Create a row",
+Create.of(
+row(
+SCHEMA,
+9223372036854775807L,
+2147483647,
+(short) 32767,
+(byte) 127,
+true,
+1.0,
+(float) 1.0,
+"string",
+row(
+Schema.builder().addNullableField("int32", 
INT32).build(),
+2147483645),
 
 Review comment:
   Instead of adding a new field, replaced an existing `string` field with a 
logical wrapper around the String type from CalciteUtils.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340826)
Time Spent: 6h 20m  (was: 6h 10m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=340823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340823
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:43
Start Date: 08/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9974: [BEAM-8472] Get 
default GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974#discussion_r344407980
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java
 ##
 @@ -199,4 +199,10 @@ public void testDefaultStagingLocationUnset() {
 thrown.expectMessage("Error constructing default value for 
stagingLocation");
 options.getStagingLocation();
   }
+
+  @Test
+  public void testDefaultGcpRegion() {
+DataflowPipelineOptions options = 
PipelineOptionsFactory.as(DataflowPipelineOptions.class);
+assertEquals("us-central1", options.getRegion());
 
 Review comment:
   Thanks for catching this Luke. Filed #10048 to improve this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340823)
Time Spent: 3h 10m  (was: 3h)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=340821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340821
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:40
Start Date: 08/Nov/19 23:40
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10048: [BEAM-8472] 
test Java default GCP region
URL: https://github.com/apache/beam/pull/10048
 
 
   Refactor and add tests for `DefaultGcpRegionFactory`.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8597) Allow TestStream trigger tests to run on other runners.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8597?focusedWorklogId=340816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340816
 ]

ASF GitHub Bot logged work on BEAM-8597:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:28
Start Date: 08/Nov/19 23:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10043: [BEAM-8597] Allow 
TestStream trigger tests to run on other runners.
URL: https://github.com/apache/beam/pull/10043#issuecomment-552030550
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340816)
Time Spent: 0.5h  (was: 20m)

> Allow TestStream trigger tests to run on other runners.
> ---
>
> Key: BEAM-8597
> URL: https://issues.apache.org/jira/browse/BEAM-8597
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=340811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340811
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:15
Start Date: 08/Nov/19 23:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344402699
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
+  def run_TestStream(self, transform_node, options):
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.testing.test_stream import ElementEvent
+from apache_beam.testing.test_stream import ProcessingTimeEvent
+from apache_beam.testing.test_stream import WatermarkEvent
+standard_options = options.view_as(StandardOptions)
+if not standard_options.streaming:
+  raise ValueError('TestStream is currently available for use '
+   'only in streaming pipelines.')
+
+transform = transform_node.transform
+step = self._add_step(TransformNames.READ, transform_node.full_label,
+  transform_node)
+step.add_property(PropertyNames.FORMAT, 'test_stream')
+test_stream_payload = beam_runner_api_pb2.TestStreamPayload()
+# TestStream source doesn't do any decoding of elements,
+# so we won't set test_stream_payload.coder_id.
+output_coder = transform._infer_output_coder()  # pylint: 
disable=protected-access
+for event in transform.events:
+  new_event = test_stream_payload.events.add()
+  if isinstance(event, ElementEvent):
+for tv in event.timestamped_values:
+  element = new_event.element_event.elements.add()
+  element.encoded_element = output_coder.encode(tv.value)
+  element.timestamp = tv.timestamp.micros
 
 Review comment:
   You could make the entire proto change in one go within this repo since no 
tests that send this data outside of this repo exist.
   
   There is no worry about backwards compatibility currently.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340811)
Time Spent: 50m  (was: 40m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8392) Update pyarrow version bounds

2019-11-08 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-8392:
---

Assignee: Ahmet Altay

> Update pyarrow version bounds
> -
>
> Key: BEAM-8392
> URL: https://issues.apache.org/jira/browse/BEAM-8392
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Ahmet Altay
>Priority: Major
>
> arrow is 
> [considering|https://lists.apache.org/thread.html/462464941d79e67bfef7d73fc900ea5fe0200e8a926253c6eb0285aa@%3Cdev.arrow.apache.org%3E]
>  a 0.15.1 release which is likely to solve the problem described here: 
> [https://issues.apache.org/jira/browse/BEAM-8368|https://github.com/apache/beam/pull/9768]
> The task is to bump up the lower bound for pyarrow in setup.py as soon as 
> 0.15.1 is released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8392) Update pyarrow version bounds

2019-11-08 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-8392.
---
Fix Version/s: 2.17.0
   Resolution: Fixed

> Update pyarrow version bounds
> -
>
> Key: BEAM-8392
> URL: https://issues.apache.org/jira/browse/BEAM-8392
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Ahmet Altay
>Priority: Major
> Fix For: 2.17.0
>
>
> arrow is 
> [considering|https://lists.apache.org/thread.html/462464941d79e67bfef7d73fc900ea5fe0200e8a926253c6eb0285aa@%3Cdev.arrow.apache.org%3E]
>  a 0.15.1 release which is likely to solve the problem described here: 
> [https://issues.apache.org/jira/browse/BEAM-8368|https://github.com/apache/beam/pull/9768]
> The task is to bump up the lower bound for pyarrow in setup.py as soon as 
> 0.15.1 is released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340807
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:10
Start Date: 08/Nov/19 23:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r344401281
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -213,16 +213,37 @@ message JobMessagesResponse {
 // without needing to pass through STARTING.
 message JobState {
   enum Enum {
+// The job state reported by a runner cannot be interpreted by the SDK.
 UNSPECIFIED = 0;
+
+// The job has been paused, or has not yet started.
 STOPPED = 1;
+
+// The job is currently running. (terminal)
 
 Review comment:
   not terminal?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340807)
Time Spent: 4h 40m  (was: 4.5h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340806
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:10
Start Date: 08/Nov/19 23:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r344401316
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -213,16 +213,37 @@ message JobMessagesResponse {
 // without needing to pass through STARTING.
 message JobState {
   enum Enum {
+// The job state reported by a runner cannot be interpreted by the SDK.
 UNSPECIFIED = 0;
+
+// The job has been paused, or has not yet started.
 STOPPED = 1;
+
+// The job is currently running. (terminal)
 RUNNING = 2;
+
+// The job has successfully completed. (terminal)
 DONE = 3;
+
+// The job has failed. (terminal)
 FAILED = 4;
+
+// The job has been explicitly cancelled. (terminal)
 CANCELLED = 5;
+
+// The job has been updated.
 
 Review comment:
   terminal?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340806)
Time Spent: 4.5h  (was: 4h 20m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340805
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:08
Start Date: 08/Nov/19 23:08
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r344401072
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -213,16 +213,37 @@ message JobMessagesResponse {
 // without needing to pass through STARTING.
 message JobState {
   enum Enum {
+// The job state reported by a runner cannot be interpreted by the SDK.
 UNSPECIFIED = 0;
+
+// The job has been paused, or has not yet started.
 STOPPED = 1;
+
+// The job is currently running. (terminal)
 RUNNING = 2;
+
+// The job has successfully completed. (terminal)
 DONE = 3;
+
+// The job has failed. (terminal)
 FAILED = 4;
+
+// The job has been explicitly cancelled. (terminal)
 CANCELLED = 5;
+
+// The job has been updated.
 UPDATED = 6;
+
+// The job is draining its data.
 DRAINING = 7;
+
+// The job has completed draining its data. (terminal)
 DRAINED = 8;
+
+// The job is starting up.
 STARTING = 9;
+
+// The job is cancelling.
 CANCELLING = 10;
 UPDATING = 11;
 
 Review comment:
   `// The job is in the progress of being updated.`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340805)
Time Spent: 4h 20m  (was: 4h 10m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340804
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:07
Start Date: 08/Nov/19 23:07
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9997: [BEAM-8157] Fix 
key encoding issues for state requests with unknown coders / Improve debugging 
and testing
URL: https://github.com/apache/beam/pull/9997#discussion_r344400891
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -98,6 +100,8 @@ PortablePipelineResult runPipelineWithTranslator(
 ? trimmedPipeline
 : GreedyPipelineFuser.fuse(trimmedPipeline).toPipeline();
 
+fusedPipeline = lengthPrefixKeyCoderOfStatefulStages(fusedPipeline);
 
 Review comment:
   And could potentially be pulled up into `JobInvocation`, which is already 
shared.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340804)
Time Spent: 9h 20m  (was: 9h 10m)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=340803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340803
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:06
Start Date: 08/Nov/19 23:06
Worklog Time Spent: 10m 
  Work Description: acrites commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344400618
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
+  def run_TestStream(self, transform_node, options):
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.testing.test_stream import ElementEvent
+from apache_beam.testing.test_stream import ProcessingTimeEvent
+from apache_beam.testing.test_stream import WatermarkEvent
+standard_options = options.view_as(StandardOptions)
+if not standard_options.streaming:
+  raise ValueError('TestStream is currently available for use '
+   'only in streaming pipelines.')
+
+transform = transform_node.transform
+step = self._add_step(TransformNames.READ, transform_node.full_label,
+  transform_node)
+step.add_property(PropertyNames.FORMAT, 'test_stream')
+test_stream_payload = beam_runner_api_pb2.TestStreamPayload()
+# TestStream source doesn't do any decoding of elements,
+# so we won't set test_stream_payload.coder_id.
+output_coder = transform._infer_output_coder()  # pylint: 
disable=protected-access
+for event in transform.events:
+  new_event = test_stream_payload.events.add()
+  if isinstance(event, ElementEvent):
+for tv in event.timestamped_values:
+  element = new_event.element_event.elements.add()
+  element.encoded_element = output_coder.encode(tv.value)
+  element.timestamp = tv.timestamp.micros
 
 Review comment:
   I guess one path forward here would be to add new google.protobuf.Timestamp 
fields to TestStreamPayload making it something like:
   
   message TestStreamPayload {
 // (Required) the coder for elements in the TestStream events
 string coder_id = 1;
   
 repeated Event events = 2;
   
 message Event {
   oneof event {
 AdvanceWatermark watermark_event = 1;
 AdvanceProcessingTime processing_time_event = 2;
 AddElements element_event = 3;
   }
   
   message AdvanceWatermark {
 int64 new_watermark = 1;
 Timestamp new_watermark_timestamp = 2;
   }
   
   message AdvanceProcessingTime {
 int64 advance_duration = 1;
 Timestamp advance_duration_timestamp = 2;
   }
   
   message AddElements {
 repeated TimestampedElement elements = 1;
   }
 }
   
 message TimestampedElement {
   bytes encoded_element = 1;
   int64 timestamp = 2;
   Timestamp timestamp_timestamp = 3; // just kidding, not quite sure what 
to name this one
 }
   }
   
   Then I can add both the Timestamp and int64 version to the payload for now. 
Then separately I can change all the internal code that processes the 
TestStreamPayload to use the new fields. As a third step we could then 
deprecate the old fields.
   
   One thing to note is that the python Timestamp objects only have microsecond 
resolution, so I'm guessing there's still a lot of work to be done to 
transition to nanos.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340803)
Time Spent: 40m  (was: 0.5h)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340801
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:03
Start Date: 08/Nov/19 23:03
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9997: [BEAM-8157] Fix 
key encoding issues for state requests with unknown coders / Improve debugging 
and testing
URL: https://github.com/apache/beam/pull/9997#discussion_r344400034
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -98,6 +100,8 @@ PortablePipelineResult runPipelineWithTranslator(
 ? trimmedPipeline
 : GreedyPipelineFuser.fuse(trimmedPipeline).toPipeline();
 
+fusedPipeline = lengthPrefixKeyCoderOfStatefulStages(fusedPipeline);
 
 Review comment:
   The sequence of steps (trim, fuse, and now 
lengthPrefixKeyCoderOfStatefulStages) that repeats between Flink and Spark 
could be moved to a utility in `org.apache.beam.runners.core.construction.graph`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340801)
Time Spent: 9h 10m  (was: 9h)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=340800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340800
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:02
Start Date: 08/Nov/19 23:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10046: [BEAM-8579] Strip 
UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046#issuecomment-552024365
 
 
   Overall the PR and tests look good just the location of the tests should 
move.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340800)
Time Spent: 50m  (was: 40m)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=340799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340799
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 08/Nov/19 23:01
Start Date: 08/Nov/19 23:01
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10046: [BEAM-8579] 
Strip UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046#discussion_r344398516
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/io/TextSourceTest.java
 ##
 @@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.nio.charset.Charset;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO.ReadableFile;
+import org.apache.beam.sdk.io.fs.EmptyMatchTreatment;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.testing.NeedsRunner;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for TextSource class. */
+@RunWith(JUnit4.class)
+public class TextSourceTest {
 
 Review comment:
   Please add these tests to `TextIOReadTest.java` instead of creating a new 
file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340799)
Time Spent: 40m  (was: 0.5h)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340797
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:58
Start Date: 08/Nov/19 22:58
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9997: [BEAM-8157] Fix 
key encoding issues for state requests with unknown coders / Improve debugging 
and testing
URL: https://github.com/apache/beam/pull/9997#discussion_r344398968
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -134,6 +138,69 @@ private PortablePipelineResult 
createPortablePipelineResult(
 }
   }
 
+  /**
+   * Patches all input coders of stateful ExecutableStage transforms. Stateful 
transforms aways have
+   * a KvCoder as input. Flink partitions the data based on the serialized 
version of the key. This
+   * key must retain the same binary representation to be able to serve the 
state in state requests
+   * from the SDK Harness from the correct partition. If the binary 
representation does not match,
+   * this will result in inconsistent checkpoints because keys are associated 
to a Flink key group
+   * which does not belong to the key group range of the state handling 
operator.
+   */
+  private static RunnerApi.Pipeline lengthPrefixKeyCoderOfStatefulStages(
 
 Review comment:
   `org.apache.beam.runners.core.construction.graph` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340797)
Time Spent: 9h  (was: 8h 50m)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340796
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:56
Start Date: 08/Nov/19 22:56
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9997: [BEAM-8157] Fix 
key encoding issues for state requests with unknown coders / Improve debugging 
and testing
URL: https://github.com/apache/beam/pull/9997#discussion_r344398451
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -134,6 +138,69 @@ private PortablePipelineResult 
createPortablePipelineResult(
 }
   }
 
+  /**
+   * Patches all input coders of stateful ExecutableStage transforms. Stateful 
transforms aways have
+   * a KvCoder as input. Flink partitions the data based on the serialized 
version of the key. This
+   * key must retain the same binary representation to be able to serve the 
state in state requests
+   * from the SDK Harness from the correct partition. If the binary 
representation does not match,
+   * this will result in inconsistent checkpoints because keys are associated 
to a Flink key group
+   * which does not belong to the key group range of the state handling 
operator.
+   */
+  private static RunnerApi.Pipeline lengthPrefixKeyCoderOfStatefulStages(
 
 Review comment:
   It appears that this should be the "default" behavior to produce a 
consistent representation of the pipeline for runner consumption. Since the 
code here isn't Flink specific, it should move to a common package.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340796)
Time Spent: 8h 50m  (was: 8h 40m)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340795
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:53
Start Date: 08/Nov/19 22:53
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9997: [BEAM-8157] Fix 
key encoding issues for state requests with unknown coders / Improve debugging 
and testing
URL: https://github.com/apache/beam/pull/9997#discussion_r344397717
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -98,6 +100,8 @@ PortablePipelineResult runPipelineWithTranslator(
 ? trimmedPipeline
 : GreedyPipelineFuser.fuse(trimmedPipeline).toPipeline();
 
+fusedPipeline = lengthPrefixKeyCoderOfStatefulStages(fusedPipeline);
 
 Review comment:
   The operations above are already repeated in `SparkPipelineRunner`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340795)
Time Spent: 8h 40m  (was: 8.5h)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=340793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340793
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:50
Start Date: 08/Nov/19 22:50
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9974: [BEAM-8472] 
Get default GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974#discussion_r344397166
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/test/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptionsTest.java
 ##
 @@ -199,4 +199,10 @@ public void testDefaultStagingLocationUnset() {
 thrown.expectMessage("Error constructing default value for 
stagingLocation");
 options.getStagingLocation();
   }
+
+  @Test
+  public void testDefaultGcpRegion() {
+DataflowPipelineOptions options = 
PipelineOptionsFactory.as(DataflowPipelineOptions.class);
+assertEquals("us-central1", options.getRegion());
 
 Review comment:
   This test will fail on any machine where the `CLOUDSDK_COMPUTE_REGION` is 
defined or `gcloud config compute/region` is set.
   
   You can test your code by instantiating the DefaultGcpRegionFactory directly 
and ensuring that it is factored in such a way where you can pass in the 
environment map directly to it and similarly for testing the process execution.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340793)
Time Spent: 2h 50m  (was: 2h 40m)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=340792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340792
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:48
Start Date: 08/Nov/19 22:48
Worklog Time Spent: 10m 
  Work Description: cmm08 commented on issue #10046: [BEAM-8579] Strip 
UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046#issuecomment-552011322
 
 
   R: @lukecwik
   CC: @kennknowles
   
   Hi, Could you please review this PR? Thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340792)
Time Spent: 0.5h  (was: 20m)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=340790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340790
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:44
Start Date: 08/Nov/19 22:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-552019300
 
 
   I think the complexity of guarding this with a flag will be much larger then 
rebasing when the library is updated once 
https://github.com/ftpsolutions/collapsing-thread-pool-executor/pull/4 is 
merged and released so I'm not seeing the value.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340790)
Time Spent: 8h  (was: 7h 50m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=340791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340791
 ]

ASF GitHub Bot logged work on BEAM-8452:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:44
Start Date: 08/Nov/19 22:44
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #1: BEAM-8452 
- TriggerLoadJobs.process in bigquery_file_loads schema is type str
URL: https://github.com/apache/beam/pull/1#discussion_r344395599
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -387,6 +387,14 @@ def process(self, element, load_job_name_prefix, 
*schema_side_inputs):
 else:
   schema = self.schema
 
+import unicode
+import json
+
+if isinstance(schema, (str, unicode)):
+  schema = bigquery_tools.parse_table_schema_from_json(schema)
+elif isinstance(schema, dict):
 
 Review comment:
   Thanks. This might help: 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340791)
Time Spent: 1h 50m  (was: 1h 40m)

> TriggerLoadJobs.process in bigquery_file_loads schema is type str
> -
>
> Key: BEAM-8452
> URL: https://issues.apache.org/jira/browse/BEAM-8452
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Noah Goodrich
>Assignee: Noah Goodrich
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
>  I've found a first issue with the BigQueryFileLoads Transform and the type 
> of the schema parameter.
> {code:java}
> Triggering job 
> beam_load_2019_10_11_140829_19_157670e4d458f0ff578fbe971a91b30a_1570802915 to 
> load data to BigQuery table   datasetId: 'pyr_monat_dev'
>  projectId: 'icentris-ml-dev'
>  tableId: 'tree_user_types'>.Schema: {"fields": [{"name": "id", "type": 
> "INTEGER", "mode": "required"}, {"name": "description", "type": "STRING", 
> "mode": "nullable"}]}. Additional parameters: {}
> Retry with exponential backoff: waiting for 4.875033410381894 seconds before 
> retrying _insert_load_job because we caught exception: 
> apitools.base.protorpclite.messages.ValidationError: Expected type  s 
> 'apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_messages.TableSchema'>
>  for field schema, found {"fields": [{"name": "id", "type": "INTEGER", 
> "mode": "required"}, {"name": "description", "type"
> : "STRING", "mode": "nullable"}]} (type )
>  Traceback for above exception (most recent call last):
>   File "/opt/conda/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 206, in wrapper
>     return fun(*args, **kwargs)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 344, in _insert_load_job
>     **additional_load_parameters
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
>     setattr(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 973, in __setattr__
>     object.__setattr__(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1652, in __set__
>     super(MessageField, self).__set__(message_instance, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1293, in __set__
>     value = self.validate(value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1400, in validate
>     return self.__validate(value, self.validate_element)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1358, in __validate
>     return validate_element(value)   
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1340, in validate_element
>     (self.type, name, value, type(value)))
>  
> {code}
>  
> The triggering code looks like this:
>  
> options.view_as(DebugOptions).experiments = ['use_beam_bq_sink']
>         # Save main session state so pickled functions and classes
>         # defined in __main__ can be unpickled
>         options.view_as(SetupOptions).save_main_session = True
>         custom_options = options.view_as(LoadSqlToBqOptions)
>         with 

[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=340788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340788
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:37
Start Date: 08/Nov/19 22:37
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-552017665
 
 
   Could we perhaps get this in guarded by a flag so we can do performance 
testing while we wait for fixes and another release?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340788)
Time Spent: 7h 50m  (was: 7h 40m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=340787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340787
 ]

ASF GitHub Bot logged work on BEAM-8452:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:35
Start Date: 08/Nov/19 22:35
Worklog Time Spent: 10m 
  Work Description: noah-goodrich commented on pull request #1: 
BEAM-8452 - TriggerLoadJobs.process in bigquery_file_loads schema is type str
URL: https://github.com/apache/beam/pull/1#discussion_r344393234
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -387,6 +387,14 @@ def process(self, element, load_job_name_prefix, 
*schema_side_inputs):
 else:
   schema = self.schema
 
+import unicode
+import json
+
+if isinstance(schema, (str, unicode)):
+  schema = bigquery_tools.parse_table_schema_from_json(schema)
+elif isinstance(schema, dict):
 
 Review comment:
   @chamikaramj I am working on it. Having a very difficult time trying to get 
the tests running locally.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340787)
Time Spent: 1h 40m  (was: 1.5h)

> TriggerLoadJobs.process in bigquery_file_loads schema is type str
> -
>
> Key: BEAM-8452
> URL: https://issues.apache.org/jira/browse/BEAM-8452
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Noah Goodrich
>Assignee: Noah Goodrich
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  I've found a first issue with the BigQueryFileLoads Transform and the type 
> of the schema parameter.
> {code:java}
> Triggering job 
> beam_load_2019_10_11_140829_19_157670e4d458f0ff578fbe971a91b30a_1570802915 to 
> load data to BigQuery table   datasetId: 'pyr_monat_dev'
>  projectId: 'icentris-ml-dev'
>  tableId: 'tree_user_types'>.Schema: {"fields": [{"name": "id", "type": 
> "INTEGER", "mode": "required"}, {"name": "description", "type": "STRING", 
> "mode": "nullable"}]}. Additional parameters: {}
> Retry with exponential backoff: waiting for 4.875033410381894 seconds before 
> retrying _insert_load_job because we caught exception: 
> apitools.base.protorpclite.messages.ValidationError: Expected type  s 
> 'apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_messages.TableSchema'>
>  for field schema, found {"fields": [{"name": "id", "type": "INTEGER", 
> "mode": "required"}, {"name": "description", "type"
> : "STRING", "mode": "nullable"}]} (type )
>  Traceback for above exception (most recent call last):
>   File "/opt/conda/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 206, in wrapper
>     return fun(*args, **kwargs)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 344, in _insert_load_job
>     **additional_load_parameters
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
>     setattr(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 973, in __setattr__
>     object.__setattr__(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1652, in __set__
>     super(MessageField, self).__set__(message_instance, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1293, in __set__
>     value = self.validate(value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1400, in validate
>     return self.__validate(value, self.validate_element)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1358, in __validate
>     return validate_element(value)   
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1340, in validate_element
>     (self.type, name, value, type(value)))
>  
> {code}
>  
> The triggering code looks like this:
>  
> options.view_as(DebugOptions).experiments = ['use_beam_bq_sink']
>         # Save main session state so pickled functions and classes
>         # defined in __main__ can be unpickled
>         options.view_as(SetupOptions).save_main_session = True
>         custom_options = options.view_as(LoadSqlToBqOptions)
>     

[jira] [Work logged] (BEAM-8452) TriggerLoadJobs.process in bigquery_file_loads schema is type str

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8452?focusedWorklogId=340785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340785
 ]

ASF GitHub Bot logged work on BEAM-8452:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:25
Start Date: 08/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #1: BEAM-8452 
- TriggerLoadJobs.process in bigquery_file_loads schema is type str
URL: https://github.com/apache/beam/pull/1#discussion_r344390667
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -387,6 +387,14 @@ def process(self, element, load_job_name_prefix, 
*schema_side_inputs):
 else:
   schema = self.schema
 
+import unicode
+import json
+
+if isinstance(schema, (str, unicode)):
+  schema = bigquery_tools.parse_table_schema_from_json(schema)
+elif isinstance(schema, dict):
 
 Review comment:
   Thanks. Can you please add a unit test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340785)
Time Spent: 1.5h  (was: 1h 20m)

> TriggerLoadJobs.process in bigquery_file_loads schema is type str
> -
>
> Key: BEAM-8452
> URL: https://issues.apache.org/jira/browse/BEAM-8452
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Noah Goodrich
>Assignee: Noah Goodrich
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  I've found a first issue with the BigQueryFileLoads Transform and the type 
> of the schema parameter.
> {code:java}
> Triggering job 
> beam_load_2019_10_11_140829_19_157670e4d458f0ff578fbe971a91b30a_1570802915 to 
> load data to BigQuery table   datasetId: 'pyr_monat_dev'
>  projectId: 'icentris-ml-dev'
>  tableId: 'tree_user_types'>.Schema: {"fields": [{"name": "id", "type": 
> "INTEGER", "mode": "required"}, {"name": "description", "type": "STRING", 
> "mode": "nullable"}]}. Additional parameters: {}
> Retry with exponential backoff: waiting for 4.875033410381894 seconds before 
> retrying _insert_load_job because we caught exception: 
> apitools.base.protorpclite.messages.ValidationError: Expected type  s 
> 'apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_messages.TableSchema'>
>  for field schema, found {"fields": [{"name": "id", "type": "INTEGER", 
> "mode": "required"}, {"name": "description", "type"
> : "STRING", "mode": "nullable"}]} (type )
>  Traceback for above exception (most recent call last):
>   File "/opt/conda/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 206, in wrapper
>     return fun(*args, **kwargs)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
>  line 344, in _insert_load_job
>     **additional_load_parameters
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 791, in __init__
>     setattr(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 973, in __setattr__
>     object.__setattr__(self, name, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1652, in __set__
>     super(MessageField, self).__set__(message_instance, value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1293, in __set__
>     value = self.validate(value)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1400, in validate
>     return self.__validate(value, self.validate_element)
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1358, in __validate
>     return validate_element(value)   
>   File 
> "/opt/conda/lib/python3.7/site-packages/apitools/base/protorpclite/messages.py",
>  line 1340, in validate_element
>     (self.type, name, value, type(value)))
>  
> {code}
>  
> The triggering code looks like this:
>  
> options.view_as(DebugOptions).experiments = ['use_beam_bq_sink']
>         # Save main session state so pickled functions and classes
>         # defined in __main__ can be unpickled
>         options.view_as(SetupOptions).save_main_session = True
>         custom_options = options.view_as(LoadSqlToBqOptions)
>         with beam.Pipeline(options=options) as p:
>             (p
>    

[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=340784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340784
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:24
Start Date: 08/Nov/19 22:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9974: [BEAM-8472] Get 
default GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340784)
Time Spent: 2h 40m  (was: 2.5h)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8028) Simplify running of Beam Python on Spark

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8028?focusedWorklogId=340783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340783
 ]

ASF GitHub Bot logged work on BEAM-8028:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:22
Start Date: 08/Nov/19 22:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10047: [BEAM-8028] add 
spark_runner.py
URL: https://github.com/apache/beam/pull/10047
 
 
   Pretty straightforward adaptation of `flink_runner.py` and tests. Note that 
this PR only adds support for the java subprocess job server flavor. Next step 
is Spark jar creation/submission.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340780
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:18
Start Date: 08/Nov/19 22:18
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-552011987
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340780)
Time Spent: 9h  (was: 8h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340776
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:16
Start Date: 08/Nov/19 22:16
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-552011304
 
 
   Jenkins node "too many open files".
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340776)
Time Spent: 4h 40m  (was: 4.5h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340775
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:16
Start Date: 08/Nov/19 22:16
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340775)
Time Spent: 2h 20m  (was: 2h 10m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340778
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:17
Start Date: 08/Nov/19 22:17
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-552011328
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340778)
Time Spent: 4h 50m  (was: 4h 40m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=340777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340777
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:17
Start Date: 08/Nov/19 22:17
Worklog Time Spent: 10m 
  Work Description: cmm08 commented on issue #10046: [BEAM-8579] Strip 
UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046#issuecomment-552011322
 
 
   R:@lukecwik
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340777)
Time Spent: 20m  (was: 10m)

> Strip UTF-8 BOM bytes (if present) in TextSource.
> -
>
> Key: BEAM-8579
> URL: https://issues.apache.org/jira/browse/BEAM-8579
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-text
>Affects Versions: 2.15.0
>Reporter: Changming Ma
>Assignee: Changming Ma
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TextSource in the org.apache.beam.sdk.io package can handle UTF-8 encoded 
> files, and when the file contains byte order mark (BOM), it will preserve it 
> in the output. According to Unicode standard 
> ([http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf|https://www.google.com/url?q=http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf=D=AFQjCNF_PW0McUUnM1UrvZSIwgvAj1uUKw]):
>  "Use of a BOM is neither required nor recommended for UTF-8". UTF-8 with a 
> BOM will also be a potential problem for some Java implementations (e.g., 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058|https://www.google.com/url?q=https://bugs.java.com/bugdatabase/view_bug.do?bug_id%3D4508058=D=AFQjCNEdT7vUK99N5bxQc9fkCt-uIG2v7Q]).
>  As a general practice, it's suggested to use UTF-8 without BOM.
> Proposal: remove BOM bytes in the output from TextSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=340774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340774
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:16
Start Date: 08/Nov/19 22:16
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on issue #10013: [BEAM-8554] Use 
WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#issuecomment-552010960
 
 
   Squashed the commits.  Passing to a committer for review
   
   R: @reuvenlax
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340774)
Time Spent: 2.5h  (was: 2h 20m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8579) Strip UTF-8 BOM bytes (if present) in TextSource.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8579?focusedWorklogId=340773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340773
 ]

ASF GitHub Bot logged work on BEAM-8579:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:15
Start Date: 08/Nov/19 22:15
Worklog Time Spent: 10m 
  Work Description: cmm08 commented on pull request #10046: [BEAM-8579] 
Strip UTF-8 BOM from TextSource output.
URL: https://github.com/apache/beam/pull/10046
 
 
   Strip UTF-8 BOM from TextSource output, so downstream components do not have 
to handle it.
   
   R:@lukecwik
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340771
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:05
Start Date: 08/Nov/19 22:05
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-552007490
 
 
   > I definitely think we could remove OnceTrigger. I deliberately left it out 
of portability. However it would be backwards incompatible if a user has 
declared a variable of the type. 
   Just a one small note: the class is marked `@Internal` - if that annotation 
should mean something, then removal of such a class would not be considered 
backwards incompatible. Another question (and related!) is whether it should be 
part of public APIs (e.g. `Trigger#orFinally`, but many more).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340771)
Time Spent: 4h 20m  (was: 4h 10m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating 

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340772
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:05
Start Date: 08/Nov/19 22:05
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-552007490
 
 
   > I definitely think we could remove OnceTrigger. I deliberately left it out 
of portability. However it would be backwards incompatible if a user has 
declared a variable of the type. 
   
   Just a one small note: the class is marked `@Internal` - if that annotation 
should mean something, then removal of such a class would not be considered 
backwards incompatible. Another question (and related!) is whether it should be 
part of public APIs (e.g. `Trigger#orFinally`, but many more).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340772)
Time Spent: 4.5h  (was: 4h 20m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340769
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:04
Start Date: 08/Nov/19 22:04
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9892: 
[BEAM-8427] [SQL] buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#discussion_r344384368
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/RowJson.java
 ##
 @@ -351,6 +351,9 @@ private void writeValue(JsonGenerator gen, FieldType type, 
Object value) throws
 case ROW:
   writeRow((Row) value, type.getRowSchema(), gen);
   break;
+case LOGICAL_TYPE:
+  writeValue(gen, type.getLogicalType().getBaseType(), value);
+  break;
 
 Review comment:
   Nice! For some reason I was thinking this would be a big pain, but this is 
nice and simple :+1: 
   
   Could you add some tests of logical types to `RowJsonTest`? You could just 
use `LogicalTypes.FixedSizeBytes` I think.
   
   Probably the easiest way is to add a `makeLogicalTypeTestCase` here: 
https://github.com/apache/beam/blob/35da90a94953597e9e5676e1e1e70f27d2a8f064/sdks/java/core/src/test/java/org/apache/beam/sdk/util/RowJsonTest.java#L68-L73
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340769)
Time Spent: 6h 10m  (was: 6h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340770
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:04
Start Date: 08/Nov/19 22:04
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9892: 
[BEAM-8427] [SQL] buildIOWrite for MongoDb Table
URL: https://github.com/apache/beam/pull/9892#discussion_r344383786
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableTest.java
 ##
 @@ -99,6 +100,35 @@ public void testDocumentToRowConverter() {
 pipeline.run().waitUntilFinish();
   }
 
+  @Test
+  public void testRowToDocumentConverter() {
+PCollection output =
+pipeline
+.apply(
+"Create a row",
+Create.of(
+row(
+SCHEMA,
+9223372036854775807L,
+2147483647,
+(short) 32767,
+(byte) 127,
+true,
+1.0,
+(float) 1.0,
+"string",
+row(
+Schema.builder().addNullableField("int32", 
INT32).build(),
+2147483645),
 
 Review comment:
   Would be good to test logical types here (and in testDocumentToRowConverter) 
too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340770)
Time Spent: 6h 10m  (was: 6h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=340765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340765
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:58
Start Date: 08/Nov/19 21:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#issuecomment-552005195
 
 
   Please also fix the lint issues, note that you can run the linter locally 
`./gradlew :sdks:python:test-suites:tox:py37:lintPy37` from where you checked 
out the beam git repo.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340765)
Time Spent: 0.5h  (was: 20m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=340763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340763
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:57
Start Date: 08/Nov/19 21:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344380010
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
 
 Review comment:
   ```suggestion
 @unittest.skip("The test setup currently matches this method and attempts 
to run it as a test which is why we explicitly mark this with skip.")
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340763)
Time Spent: 20m  (was: 10m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8587?focusedWorklogId=340764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340764
 ]

ASF GitHub Bot logged work on BEAM-8587:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:57
Start Date: 08/Nov/19 21:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10041: [BEAM-8587] 
TestStream for Dataflow runner
URL: https://github.com/apache/beam/pull/10041#discussion_r344380809
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -1195,6 +1196,49 @@ def run__NativeWrite(self, transform_node, options):
  PropertyNames.STEP_NAME: input_step.proto.name,
  PropertyNames.OUTPUT_NAME: input_step.get_output(input_tag)})
 
+  @unittest.skip("This is not a test, despite the name.")
+  def run_TestStream(self, transform_node, options):
+from apache_beam.portability.api import beam_runner_api_pb2
+from apache_beam.testing.test_stream import ElementEvent
+from apache_beam.testing.test_stream import ProcessingTimeEvent
+from apache_beam.testing.test_stream import WatermarkEvent
+standard_options = options.view_as(StandardOptions)
+if not standard_options.streaming:
+  raise ValueError('TestStream is currently available for use '
+   'only in streaming pipelines.')
+
+transform = transform_node.transform
+step = self._add_step(TransformNames.READ, transform_node.full_label,
+  transform_node)
+step.add_property(PropertyNames.FORMAT, 'test_stream')
+test_stream_payload = beam_runner_api_pb2.TestStreamPayload()
+# TestStream source doesn't do any decoding of elements,
+# so we won't set test_stream_payload.coder_id.
+output_coder = transform._infer_output_coder()  # pylint: 
disable=protected-access
+for event in transform.events:
+  new_event = test_stream_payload.events.add()
+  if isinstance(event, ElementEvent):
+for tv in event.timestamped_values:
+  element = new_event.element_event.elements.add()
+  element.encoded_element = output_coder.encode(tv.value)
+  element.timestamp = tv.timestamp.micros
 
 Review comment:
   TestStreamPayload should really be using google.protobuf.Timestamp instead 
of int64 so we can get support for nanos. (here and below in processing time)
   
   @robertwb has been working to migrate to use nanos everywhere and wouldn't 
want to add another place to migrate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340764)
Time Spent: 20m  (was: 10m)

> Add TestStream support for Dataflow runner
> --
>
> Key: BEAM-8587
> URL: https://issues.apache.org/jira/browse/BEAM-8587
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, testing
>Reporter: Andrew Crites
>Assignee: Andrew Crites
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestStream support needed to test features like late data and processing time 
> triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340761
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:49
Start Date: 08/Nov/19 21:49
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-552002401
 
 
   Fixed commit history and I will merge once Java suite passes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340761)
Time Spent: 4h 10m  (was: 4h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340760
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:48
Start Date: 08/Nov/19 21:48
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-552002022
 
 
   I definitely think we could remove OnceTrigger. I deliberately left it out 
of portability. However it would be backwards incompatible if a user has 
declared a variable of the type. I do think that after this PR we can still 
proceed with even more than #9942 to clean up. That PR makes it so that a 
finishing trigger just stops firing until GC time. But we can simply stop 
tracking whether or not top level triggers are finished, entirely.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340760)
Time Spent: 4h  (was: 3h 50m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating 

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340758
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:40
Start Date: 08/Nov/19 21:40
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9957: [BEAM-8575] 
Add validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#discussion_r344375411
 
 

 ##
 File path: sdks/python/apache_beam/portability/common_urns.py
 ##
 @@ -80,6 +80,10 @@ def PropertiesFromPayloadType(payload_type):
 session_windows = PropertiesFromPayloadType(
 standard_window_fns_pb2.SessionsPayload)
 
+# Custom window for test only.
+test_custom_windows = PropertiesFromPayloadType(
 
 Review comment:
   should not be defined here but locally within the test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340758)
Time Spent: 1h 40m  (was: 1.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340757
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:40
Start Date: 08/Nov/19 21:40
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9957: [BEAM-8575] 
Add validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#discussion_r344377337
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window_test.py
 ##
 @@ -65,6 +76,44 @@ def process(self, element, window=core.DoFn.WindowParam):
 
 reify_windows = core.ParDo(ReifyWindowsFn())
 
+class TestCustomWindows(NonMergingWindowFn):
+  """A custom non merging window fn which assigns elements into interval 
windows
+  based on the element timestamps.
+  """
+
+  def __init__(self, first_window_end, second_window_end):
+self.first_window_end = Timestamp.of(first_window_end)
+self.second_window_end = Timestamp.of(second_window_end)
+
+  def assign(self, context):
+timestamp = context.timestamp
+if timestamp < self.first_window_end:
+  return [IntervalWindow(0, self.first_window_end)]
+elif timestamp < self.second_window_end:
+  return [IntervalWindow(self.first_window_end, self.second_window_end)]
+else:
+  return [IntervalWindow(self.second_window_end, timestamp)]
+
+  def get_window_coder(self):
+return IntervalWindowCoder()
+
+  def to_runner_api_parameter(self, context):
 
 Review comment:
   It doesn't seem like there is a good way for custom user window fns to 
return a python dict or some other structure and you currently need to 
translate the data you want to store in the window fn into some proto. Instead 
of defining a test windowfn proto, use one of the well known protobuf types 
such as 
[Struct](https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/struct)
 or 
[BytesValue](https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/bytes-value)
 and embed the information within those objects yourself instead of defining 
your own within StandardWindowFns.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340757)
Time Spent: 1h 40m  (was: 1.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340756
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:30
Start Date: 08/Nov/19 21:30
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551996661
 
 
   maybe we should just make BigQueryIOIT actually run ;)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340756)
Time Spent: 5h 40m  (was: 5.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340754
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:28
Start Date: 08/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551996254
 
 
   Ok, fair enough.
   
   I'll go ahead and merge this. But please consider adding a regularly running 
integration test in a follow up PR to make sure that this codepath does not 
become stale/broken.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340754)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340755
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:28
Start Date: 08/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9665: 
[BEAM-2879] Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340755)
Time Spent: 5.5h  (was: 5h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340753
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:26
Start Date: 08/Nov/19 21:26
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551995495
 
 
   > @steveniemitz Will you be able to add a test that writes to BQ using Avro 
to the BigQueryTornadoesIT that is captured by the Beam Java PostCommit test 
suite ?
   
   honestly at this point I'm not going to prioritize doing so.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340753)
Time Spent: 5h 10m  (was: 5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340752
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:19
Start Date: 08/Nov/19 21:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551993467
 
 
   Unfortunately, looks like BigQueryIOIT is a recently added test that is 
currently not captured by any of the test suites.
   
   @steveniemitz Will you be able to add a test that writes to BQ using Avro to 
the BigQueryTornadoesIT that is captured by the Beam Java PostCommit test suite 
?
   
https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java
   
https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Java/lastCompletedBuild/testReport/org.apache.beam.examples.cookbook/BigQueryTornadoesIT/
   
   @mwalenia can you comment on the status of BigQueryIOIT ?
   
   cc: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340752)
Time Spent: 5h  (was: 4h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5600) Splitting for SplittableDoFn should be exposed within runner shared libraries

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5600?focusedWorklogId=340749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340749
 ]

ASF GitHub Bot logged work on BEAM-5600:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:06
Start Date: 08/Nov/19 21:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10045: [BEAM-5600, 
BEAM-2939] Add SplittableParDo expansion logic to runner's core.
URL: https://github.com/apache/beam/pull/10045
 
 
   Update Flink, Spark, and Samza to use it removing the unnecessary expansion 
within the Python portable runner.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=340747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340747
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:50
Start Date: 08/Nov/19 20:50
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r344361066
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
 ##
 @@ -949,63 +949,14 @@ public void testKeyCommitTooLargeException() throws 
Exception {
 assertEquals(2, result.size());
 assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
 assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
 
-// Spam worker updates a few times.
-int maxTries = 10;
-while (--maxTries > 0) {
-  worker.reportPeriodicWorkerUpdates();
-  Uninterruptibles.sleepUninterruptibly(1000, TimeUnit.MILLISECONDS);
-}
+WorkItemCommitRequest largeCommit = result.get(1L);
+assertEquals("large_key", largeCommit.getKey().toStringUtf8());
 
-// We should see an exception reported for the large commit but not the 
small one.
-ArgumentCaptor workItemStatusCaptor =
-ArgumentCaptor.forClass(WorkItemStatus.class);
-verify(mockWorkUnitClient, 
atLeast(2)).reportWorkItemStatus(workItemStatusCaptor.capture());
-List capturedStatuses = 
workItemStatusCaptor.getAllValues();
-boolean foundErrors = false;
-for (WorkItemStatus status : capturedStatuses) {
-  if (!status.getErrors().isEmpty()) {
-assertFalse(foundErrors);
-foundErrors = true;
-String errorMessage = status.getErrors().get(0).getMessage();
-assertThat(errorMessage, 
Matchers.containsString("KeyCommitTooLargeException"));
-  }
-}
-assertTrue(foundErrors);
-  }
-
-  @Test
-  public void testKeyCommitTooLargeException_StreamingEngine() throws 
Exception {
-KvCoder kvCoder = KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of());
-
-List instructions =
-Arrays.asList(
-makeSourceInstruction(kvCoder),
-makeDoFnInstruction(new LargeCommitFn(), 0, kvCoder),
-makeSinkInstruction(kvCoder, 1));
-
-FakeWindmillServer server = new FakeWindmillServer(errorCollector);
-server.setExpectedExceptionCount(1);
-
-StreamingDataflowWorkerOptions options =
-createTestingPipelineOptions(server, 
"--experiments=enable_streaming_engine");
-StreamingDataflowWorker worker = makeWorker(instructions, options, true /* 
publishCounters */);
-worker.setMaxWorkItemCommitBytes(1000);
-worker.start();
-
-server.addWorkToOffer(makeInput(1, 0, "large_key"));
-server.addWorkToOffer(makeInput(2, 0, "key"));
-server.waitForEmptyWorkQueue();
-
-Map result = 
server.waitForAndGetCommits(1);
-
-assertEquals(2, result.size());
-assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
-assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
+// The large commit should have its flags set marking it for truncation
+assertTrue(largeCommit.getExceedsMaxWorkItemCommitBytes());
+assertTrue(largeCommit.getSerializedSize() < 100);
 
 Review comment:
   Yes.  I needed to keep the estimated bytes field check given the method for 
generating the expected truncated commit, but should have cleaned up the 
others.  Doing that now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340747)
Time Spent: 2h 20m  (was: 2h 10m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's 

[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=340746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340746
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:46
Start Date: 08/Nov/19 20:46
Worklog Time Spent: 10m 
  Work Description: stevekoonce commented on pull request #10013: 
[BEAM-8554] Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r344359785
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -293,19 +293,16 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
-  optional int64 source_bytes_processed = 22 [default = 0];
+  optional int64 source_bytes_processed = 22;
 
   repeated WatermarkHold watermark_holds = 14;
 
-  repeated int64 finalize_ids = 19 [packed = true];
-
-  optional int64 testonly_fake_clock_time_usec = 23;
-
   // DEPRECATED
   repeated GlobalDataId global_data_id_requests = 9;
 
   reserved 6;
 
 Review comment:
   Ok, I'll make the change
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340746)
Time Spent: 2h 10m  (was: 2h)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340744
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:43
Start Date: 08/Nov/19 20:43
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #9957: [BEAM-8575] Add 
validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#issuecomment-551982970
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340744)
Time Spent: 1.5h  (was: 1h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=340742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340742
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:42
Start Date: 08/Nov/19 20:42
Worklog Time Spent: 10m 
  Work Description: regfaker commented on issue #10025: [BEAM-8338] Support 
ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#issuecomment-551982080
 
 
   @echauchot At rest level (I mean requests) there are no breaking changes. I 
found couple of them at RestClient level itself when I upgraded es client 
dependency and that is what I'm working on now (almost finished).
   But at all I don't like this concept of ifing several versions in one class 
is good. Wouldn't it be better to have multiple ElasticsearchIO clients (per 
version) and ElasticsearchIO will use the one it needs (this will need some 
shading or if it is some other possibility in gradle (I'm not familiar with 
it)). I think that it is not sustainable in this state (for example if some big 
breaking change in RestClient will come and some of older versions (v2) will 
stop work).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340742)
Time Spent: 2h 40m  (was: 2.5h)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=340743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340743
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:42
Start Date: 08/Nov/19 20:42
Worklog Time Spent: 10m 
  Work Description: regfaker commented on issue #10025: [BEAM-8338] Support 
ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#issuecomment-551982080
 
 
   @echauchot At rest level (I mean requests) there are no breaking changes. I 
found couple of them at RestClient level itself when I upgraded es client 
dependency and that is what I'm working on now (almost finished).
   But at all I don't like this concept of ifing several versions in one class 
is good. Wouldn't it be better to have multiple ElasticsearchIO clients (per 
version) and ElasticsearchIO will use the one it needs (this will need some 
shading or if there is some other possibility in gradle (I'm not familiar with 
it)). I think that it is not sustainable in this state (for example if some big 
breaking change in RestClient will come and some of older versions (v2) will 
stop work).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340743)
Time Spent: 2h 50m  (was: 2h 40m)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=340741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340741
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:42
Start Date: 08/Nov/19 20:42
Worklog Time Spent: 10m 
  Work Description: regfaker commented on issue #10025: [BEAM-8338] Support 
ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#issuecomment-551982080
 
 
   @echauchot At rest level (I mean requests) there are no breaking changes. I 
found couple of them at RestClient level itself when I upgraded es client 
dependency and that is what I'm working on now (almost finished).
   But at all I don't like this concept of ifing several versions in one class 
is good. Wouldn't it be better to have multiple ElasticsearchIO clients (per 
version) and ElasticsearchIO will use the one it needs (this will needs some 
shading or if it is some other possibility in gradle (I'm not familiar with 
it)). I think that it is not sustainable in this state (for example if some big 
breaking change in RestClient will come and some of older versions (v2) will 
stop work).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340741)
Time Spent: 2.5h  (was: 2h 20m)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=340739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340739
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:41
Start Date: 08/Nov/19 20:41
Worklog Time Spent: 10m 
  Work Description: regfaker commented on issue #10025: [BEAM-8338] Support 
ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#issuecomment-551982080
 
 
   @echauchot At rest level (I mean requests) there are no breaking changes. I 
found couple of them at RestClient level itself when I upgraded es client 
dependency and that is what I'm working on now (almost finished).
   But at all I don't like this concept of ifing several versions in one class. 
Wouldn't it be better to have multiple ElasticsearchIO clients and 
ElasticsearchIO will use the one it needs (this will needs some shading or if 
it is some other possibility in gradle (I'm not familiar with it)). I think 
that it is not sustainable in this state (for example if some big breaking 
change in RestClient will come and some of older versions (v2) will stop work).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340739)
Time Spent: 2h 10m  (was: 2h)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=340740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340740
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:41
Start Date: 08/Nov/19 20:41
Worklog Time Spent: 10m 
  Work Description: regfaker commented on issue #10025: [BEAM-8338] Support 
ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#issuecomment-551982080
 
 
   @echauchot At rest level (I mean requests) there are no breaking changes. I 
found couple of them at RestClient level itself when I upgraded es client 
dependency and that is what I'm working on now (almost finished).
   But at all I don't like this concept of ifing several versions in one class 
is good. Wouldn't it be better to have multiple ElasticsearchIO clients and 
ElasticsearchIO will use the one it needs (this will needs some shading or if 
it is some other possibility in gradle (I'm not familiar with it)). I think 
that it is not sustainable in this state (for example if some big breaking 
change in RestClient will come and some of older versions (v2) will stop work).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340740)
Time Spent: 2h 20m  (was: 2h 10m)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=340738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340738
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:40
Start Date: 08/Nov/19 20:40
Worklog Time Spent: 10m 
  Work Description: regfaker commented on issue #10025: [BEAM-8338] Support 
ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#issuecomment-551982080
 
 
   @echauchot At rest level (I mean requests) there is no breaking changes. I 
found couple of them at RestClient level itself when I upgraded es client 
dependency and that is what I'm working on now (almost finished).
   But at all I don't like this concept of ifing several versions in one class. 
Wouldn't it be better to have multiple ElasticsearchIO clients and 
ElasticsearchIO will use the one it needs (this will needs some shading or if 
it is some other possibility in gradle (I'm not familiar with it)). I think 
that it is not sustainable in this state (for example if some big breaking 
change in RestClient will come and some of older versions (v2) will stop work).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340738)
Time Spent: 2h  (was: 1h 50m)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=340736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340736
 ]

ASF GitHub Bot logged work on BEAM-8592:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:37
Start Date: 08/Nov/19 20:37
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10021: [BEAM-8592] 
Adjusting ZetaSQL table resolution to standard
URL: https://github.com/apache/beam/pull/10021#issuecomment-551981093
 
 
   I took another look and I think all the functionality tested by that file is 
gone.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340736)
Time Spent: 0.5h  (was: 20m)

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340735=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340735
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:36
Start Date: 08/Nov/19 20:36
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #9960: [BEAM-3288] 
Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r344356496
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java
 ##
 @@ -162,6 +164,45 @@ public static void applicableTo(PCollection input) {
   throw new IllegalStateException(
   "GroupByKey must have a valid Window merge function.  " + "Invalid 
because: " + cause);
 }
+
+// Validate that the trigger does not finish before garbage collection time
+if (!triggerIsSafe(windowingStrategy)) {
+  throw new IllegalArgumentException(
+  String.format(
+  "Unsafe trigger may lose data, see"
+  + " https://s.apache.org/finishing-triggers-drop-data: %s",
+  windowingStrategy.getTrigger()));
+}
+  }
+
+  // Note that Never trigger finishes *at* GC time so it is OK, and
+  // AfterWatermark.fromEndOfWindow() finishes at end-of-window time so it is
+  // OK if there is no allowed lateness.
+  private static boolean triggerIsSafe(WindowingStrategy 
windowingStrategy) {
+if (!windowingStrategy.getTrigger().mayFinish()) {
 
 Review comment:
   Understood. Cool. :+1:
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340735)
Time Spent: 3h 50m  (was: 3h 40m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark 

[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=340734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340734
 ]

ASF GitHub Bot logged work on BEAM-8592:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:34
Start Date: 08/Nov/19 20:34
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10021: [BEAM-8592] 
Adjusting ZetaSQL table resolution to standard
URL: https://github.com/apache/beam/pull/10021#issuecomment-551980244
 
 
   "join compound identifiers" table resolver is removed. In fact, all uses of 
`TableResolver` are unused. There is just one form of table resolution. There 
is a lot of unused code that I did not get all the way to deleting because I 
was focused on getting it working first. I will now go and clear it up a bit 
more.
   
   I will take another look at the test file and see if there are tests that 
should be in unit tests of some other class.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340734)
Time Spent: 20m  (was: 10m)

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=340733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340733
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:29
Start Date: 08/Nov/19 20:29
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r344354054
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java
 ##
 @@ -162,6 +164,45 @@ public static void applicableTo(PCollection input) {
   throw new IllegalStateException(
   "GroupByKey must have a valid Window merge function.  " + "Invalid 
because: " + cause);
 }
+
+// Validate that the trigger does not finish before garbage collection time
+if (!triggerIsSafe(windowingStrategy)) {
+  throw new IllegalArgumentException(
+  String.format(
+  "Unsafe trigger may lose data, see"
+  + " https://s.apache.org/finishing-triggers-drop-data: %s",
+  windowingStrategy.getTrigger()));
+}
+  }
+
+  // Note that Never trigger finishes *at* GC time so it is OK, and
+  // AfterWatermark.fromEndOfWindow() finishes at end-of-window time so it is
+  // OK if there is no allowed lateness.
+  private static boolean triggerIsSafe(WindowingStrategy 
windowingStrategy) {
+if (!windowingStrategy.getTrigger().mayFinish()) {
 
 Review comment:
   Yes, I agree that *top level* triggers should not finish. Subtriggers can 
finish.
   
- Consider `Repeatedly.forever(AfterPane.elementCountAtLeast(7))`: The 
inner trigger finishes, and the outer repeat starts it over.
- Consider `AfterEach.inOrder(a, b, c)`: Each of `a`, `b`, `c`, finishes to 
move to the next one.
   
   So it is fine and necessary for a subtrigger to be able to finish.
   
   The question we have to answer is: what do we do when a user tries to set a 
finishing trigger to be the top level trigger? The two possibilities discussed 
are:
   
- Reject the pipeline, force the user to choose a safe trigger (this PR)
- Accept the pipeline, but do not actually drop data. Save the data and 
emit it in the GC pane as we always do. That is #9942 
   
   How do you know whether or not to reject the pipeline? You need a method 
like `mayFinish`. Any trigger that may finish before GC time must be rejected.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340733)
Time Spent: 3h 40m  (was: 3.5h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation 

[jira] [Work logged] (BEAM-8597) Allow TestStream trigger tests to run on other runners.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8597?focusedWorklogId=340729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340729
 ]

ASF GitHub Bot logged work on BEAM-8597:


Author: ASF GitHub Bot
Created on: 08/Nov/19 20:06
Start Date: 08/Nov/19 20:06
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10043: [BEAM-8597] Allow 
TestStream trigger tests to run on other runners.
URL: https://github.com/apache/beam/pull/10043#issuecomment-551972062
 
 
   R: @liumomo315 
   
   This will also allow testing via the dataflow runner. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340729)
Time Spent: 20m  (was: 10m)

> Allow TestStream trigger tests to run on other runners.
> ---
>
> Key: BEAM-8597
> URL: https://issues.apache.org/jira/browse/BEAM-8597
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8601) Options passed to TestStream should augment, not replace, options specified by --test-pipeline-options.

2019-11-08 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8601:
-

 Summary: Options passed to TestStream should augment, not replace, 
options specified by --test-pipeline-options.
 Key: BEAM-8601
 URL: https://issues.apache.org/jira/browse/BEAM-8601
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8600) Make Python TestStream more cross-langauge friendly

2019-11-08 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8600:
-

 Summary: Make Python TestStream more cross-langauge friendly
 Key: BEAM-8600
 URL: https://issues.apache.org/jira/browse/BEAM-8600
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Robert Bradshaw


If the Coder is not a universally-known coder, set the coder to bytes and wrap 
TestStream in a composite operation that consists of the TestStream primitive + 
beam.Map(coder.decode). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8494) Python 3.8 Support

2019-11-08 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8494:
--
Issue Type: Improvement  (was: Bug)

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6983) Python 3.6 Support

2019-11-08 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-6983.
---
Fix Version/s: 2.14.0
   Resolution: Fixed

> Python 3.6 Support
> --
>
> Key: BEAM-6983
> URL: https://issues.apache.org/jira/browse/BEAM-6983
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Juta Staes
>Priority: Major
> Fix For: 2.14.0
>
>
> The first step of adding Python 3 support focused only on Python 3.5. Support 
> for Python 3.6 should be added as a next step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6984) Python 3.7 Support

2019-11-08 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-6984.
---
Fix Version/s: 2.14.0
   Resolution: Fixed

> Python 3.7 Support
> --
>
> Key: BEAM-6984
> URL: https://issues.apache.org/jira/browse/BEAM-6984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.14.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The first step of adding Python 3 support focused only on Python 3.5. Support 
> for Python 3.7 should be added as a next step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8599) Establish consensus around how many concurrent minor versions of Python Beam should support, and deprecation policy for older versions.

2019-11-08 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8599:
--
Status: Open  (was: Triage Needed)

> Establish consensus around how many concurrent minor versions of Python Beam 
> should support, and deprecation policy for older versions. 
> 
>
> Key: BEAM-8599
> URL: https://issues.apache.org/jira/browse/BEAM-8599
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8599) Establish consensus around how many concurrent minor versions of Python Beam should support, and deprecation policy for older versions.

2019-11-08 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8599:
--
Parent: BEAM-1251
Issue Type: Sub-task  (was: Bug)

> Establish consensus around how many concurrent minor versions of Python Beam 
> should support, and deprecation policy for older versions. 
> 
>
> Key: BEAM-8599
> URL: https://issues.apache.org/jira/browse/BEAM-8599
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8599) Establish consensus around how many concurrent minor versions of Python Beam should support, and deprecation policy for older versions.

2019-11-08 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8599:
-

 Summary: Establish consensus around how many concurrent minor 
versions of Python Beam should support, and deprecation policy for older 
versions. 
 Key: BEAM-8599
 URL: https://issues.apache.org/jira/browse/BEAM-8599
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8597) Allow TestStream trigger tests to run on other runners.

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8597?focusedWorklogId=340708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340708
 ]

ASF GitHub Bot logged work on BEAM-8597:


Author: ASF GitHub Bot
Created on: 08/Nov/19 19:40
Start Date: 08/Nov/19 19:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10043: [BEAM-8597] 
Allow TestStream trigger tests to run on other runners.
URL: https://github.com/apache/beam/pull/10043
 
 
   These can be run with, e.g.
   
   python setup.py nosetests \
   --tests apache_beam.transforms.trigger_test:TestStreamTranscriptTest
   --test-pipeline-options='--runner=FlinkRunner 
--environment_type=LOOPBACK ...'
   
   A gradle target 
:sdks:python:test-suites:portable:py36:flinkTriggerTranscript has
   been added, but fails due to [BEAM-8598].
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Created] (BEAM-8598) TestStream broken across multiple stages in Flink

2019-11-08 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8598:
-

 Summary: TestStream broken across multiple stages in Flink
 Key: BEAM-8598
 URL: https://issues.apache.org/jira/browse/BEAM-8598
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=340701=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340701
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 08/Nov/19 19:27
Start Date: 08/Nov/19 19:27
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r344331362
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/windmill/src/main/proto/windmill.proto
 ##
 @@ -293,19 +293,16 @@ message WorkItemCommitRequest {
   optional SourceState source_state_updates = 12;
   optional int64 source_watermark = 13 [default=-0x8000];
   optional int64 source_backlog_bytes = 17 [default=-1];
-  optional int64 source_bytes_processed = 22 [default = 0];
+  optional int64 source_bytes_processed = 22;
 
   repeated WatermarkHold watermark_holds = 14;
 
-  repeated int64 finalize_ids = 19 [packed = true];
-
-  optional int64 testonly_fake_clock_time_usec = 23;
-
   // DEPRECATED
   repeated GlobalDataId global_data_id_requests = 9;
 
   reserved 6;
 
 Review comment:
   Nit but I think you can do
   reserved 6, 19, 23;
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340701)
Time Spent: 1h 50m  (was: 1h 40m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and 
> the request is still sent to the service.  The service rejects the commit, 
> but breaks up input messages that were bundled together and adds them to new, 
> smaller work items that will later be pulled and re-tried - likely without 
> generating another commit that is too large.
> When a WorkItemCommitRequest is generated that's too large to be sent back to 
> the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
> sent back to the service.
>  
> +Proposed Improvement+
> In both cases, prevent the doomed, large commit item from being sent back to 
> the service.  Instead send flags in the commit request signaling that the 
> current work item led to a commit that is too large and the work item should 
> be broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8554?focusedWorklogId=340702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340702
 ]

ASF GitHub Bot logged work on BEAM-8554:


Author: ASF GitHub Bot
Created on: 08/Nov/19 19:27
Start Date: 08/Nov/19 19:27
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #10013: [BEAM-8554] 
Use WorkItemCommitRequest protobuf fields to signal that …
URL: https://github.com/apache/beam/pull/10013#discussion_r344331200
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java
 ##
 @@ -949,63 +949,14 @@ public void testKeyCommitTooLargeException() throws 
Exception {
 assertEquals(2, result.size());
 assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
 assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
 
-// Spam worker updates a few times.
-int maxTries = 10;
-while (--maxTries > 0) {
-  worker.reportPeriodicWorkerUpdates();
-  Uninterruptibles.sleepUninterruptibly(1000, TimeUnit.MILLISECONDS);
-}
+WorkItemCommitRequest largeCommit = result.get(1L);
+assertEquals("large_key", largeCommit.getKey().toStringUtf8());
 
-// We should see an exception reported for the large commit but not the 
small one.
-ArgumentCaptor workItemStatusCaptor =
-ArgumentCaptor.forClass(WorkItemStatus.class);
-verify(mockWorkUnitClient, 
atLeast(2)).reportWorkItemStatus(workItemStatusCaptor.capture());
-List capturedStatuses = 
workItemStatusCaptor.getAllValues();
-boolean foundErrors = false;
-for (WorkItemStatus status : capturedStatuses) {
-  if (!status.getErrors().isEmpty()) {
-assertFalse(foundErrors);
-foundErrors = true;
-String errorMessage = status.getErrors().get(0).getMessage();
-assertThat(errorMessage, 
Matchers.containsString("KeyCommitTooLargeException"));
-  }
-}
-assertTrue(foundErrors);
-  }
-
-  @Test
-  public void testKeyCommitTooLargeException_StreamingEngine() throws 
Exception {
-KvCoder kvCoder = KvCoder.of(StringUtf8Coder.of(), 
StringUtf8Coder.of());
-
-List instructions =
-Arrays.asList(
-makeSourceInstruction(kvCoder),
-makeDoFnInstruction(new LargeCommitFn(), 0, kvCoder),
-makeSinkInstruction(kvCoder, 1));
-
-FakeWindmillServer server = new FakeWindmillServer(errorCollector);
-server.setExpectedExceptionCount(1);
-
-StreamingDataflowWorkerOptions options =
-createTestingPipelineOptions(server, 
"--experiments=enable_streaming_engine");
-StreamingDataflowWorker worker = makeWorker(instructions, options, true /* 
publishCounters */);
-worker.setMaxWorkItemCommitBytes(1000);
-worker.start();
-
-server.addWorkToOffer(makeInput(1, 0, "large_key"));
-server.addWorkToOffer(makeInput(2, 0, "key"));
-server.waitForEmptyWorkQueue();
-
-Map result = 
server.waitForAndGetCommits(1);
-
-assertEquals(2, result.size());
-assertEquals(makeExpectedOutput(2, 0, "key", "key").build(), 
result.get(2L));
-assertTrue(result.containsKey(1L));
-assertEquals("large_key", result.get(1L).getKey().toStringUtf8());
-assertTrue(result.get(1L).getSerializedSize() > 1000);
+// The large commit should have its flags set marking it for truncation
+assertTrue(largeCommit.getExceedsMaxWorkItemCommitBytes());
+assertTrue(largeCommit.getSerializedSize() < 100);
 
 Review comment:
   Now that you are making a fully expected truncated commit you could remove 
the field specific checks as they are redundant.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340702)
Time Spent: 2h  (was: 1h 50m)

> Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to 
> be broken up
> -
>
> Key: BEAM-8554
> URL: https://issues.apache.org/jira/browse/BEAM-8554
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Koonce
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> +Background:+
> When a WorkItemCommitRequest is generated that's bigger than the permitted 
> size (> ~180 MB), a 

  1   2   >