[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=441676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441676 ] ASF GitHub Bot logged work on BEAM-9679: Author: ASF GitHub Bot Created on: 05/Jun/20 04:44 Start Date: 05/Jun/20 04:44 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11883: URL: https://github.com/apache/beam/pull/11883#issuecomment-639255195 @lostluck and @henryken I updated the [stepik course](https://stepik.org/course/70387) and commited the updated `*-remote.yaml` files to this PR. It is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441676) Time Spent: 9h (was: 8h 50m) > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 9h > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > > ||Transform||Pull Request||Status|| > |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| > |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| > |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed| > |Combine Simple > Function|[11866|https://github.com/apache/beam/pull/11866]|Closed| > |CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open| > |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed| > |Partition| | | > |Side Input| | | > |Side Output| | | > |Branching| | | > |Composite Transform| | | > |DoFn Additional Parameters| | | -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10201) Enhance JsonToRow to add Deadletter Support
[ https://issues.apache.org/jira/browse/BEAM-10201?focusedWorklogId=441675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441675 ] ASF GitHub Bot logged work on BEAM-10201: - Author: ASF GitHub Bot Created on: 05/Jun/20 04:35 Start Date: 05/Jun/20 04:35 Worklog Time Spent: 10m Work Description: rezarokni opened a new pull request #11929: URL: https://github.com/apache/beam/pull/11929 Add deadletter support to JsonToRow. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF
[ https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=441674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441674 ] ASF GitHub Bot logged work on BEAM-9363: Author: ASF GitHub Bot Created on: 05/Jun/20 04:27 Start Date: 05/Jun/20 04:27 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11868: URL: https://github.com/apache/beam/pull/11868#issuecomment-639251081 Thank you Andrew! Will address your comments soon! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441674) Time Spent: 8h 50m (was: 8h 40m) > BeamSQL windowing as TVF > > > Key: BEAM-9363 > URL: https://issues.apache.org/jira/browse/BEAM-9363 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: P2 > Labels: stale-assigned > Time Spent: 8h 50m > Remaining Estimate: 0h > > This Jira tracks the implementation for > https://s.apache.org/streaming-beam-sql > TVF is table-valued function, which is a SQL feature that produce a table as > function's output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=441665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441665 ] ASF GitHub Bot logged work on BEAM-8511: Author: ASF GitHub Bot Created on: 05/Jun/20 04:09 Start Date: 05/Jun/20 04:09 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #9899: URL: https://github.com/apache/beam/pull/9899#issuecomment-639246523 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441665) Time Spent: 5h 10m (was: 5h) > Support for enhanced fan-out in KinesisIO.Read > -- > > Key: BEAM-8511 > URL: https://issues.apache.org/jira/browse/BEAM-8511 > Project: Beam > Issue Type: New Feature > Components: io-java-kinesis >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: P2 > Labels: stale-assigned > Time Spent: 5h 10m > Remaining Estimate: 0h > > Add support for reading from an enhanced fan-out consumer using KinesisIO. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10194) Could not get unknown property 'println'
[ https://issues.apache.org/jira/browse/BEAM-10194?focusedWorklogId=441661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441661 ] ASF GitHub Bot logged work on BEAM-10194: - Author: ASF GitHub Bot Created on: 05/Jun/20 03:32 Start Date: 05/Jun/20 03:32 Worklog Time Spent: 10m Work Description: henryken commented on pull request #11921: URL: https://github.com/apache/beam/pull/11921#issuecomment-639237442 @pabloem, can help to merge? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441661) Time Spent: 40m (was: 0.5h) > Could not get unknown property 'println' > > > Key: BEAM-10194 > URL: https://issues.apache.org/jira/browse/BEAM-10194 > Project: Beam > Issue Type: Bug > Components: katas >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P4 > Time Spent: 40m > Remaining Estimate: 0h > > When a test fails, in addition to printing the expected error, the console > logs an additional (unexpected) error: > Could not get unknown property 'println' for task > ':Core_Transforms-Map-MapElements:test' of type > org.gradle.api.tasks.testing.Test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=441656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441656 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 05/Jun/20 03:14 Start Date: 05/Jun/20 03:14 Worklog Time Spent: 10m Work Description: suztomo commented on pull request #11586: URL: https://github.com/apache/beam/pull/11586#issuecomment-639232697 This idea was a bad idea; a hack on top of another hack. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441656) Time Spent: 14.5h (was: 14h 20m) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: P2 > Labels: stale-assigned > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 14.5h > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=441657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441657 ] ASF GitHub Bot logged work on BEAM-9444: Author: ASF GitHub Bot Created on: 05/Jun/20 03:14 Start Date: 05/Jun/20 03:14 Worklog Time Spent: 10m Work Description: suztomo closed pull request #11586: URL: https://github.com/apache/beam/pull/11586 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441657) Time Spent: 14h 40m (was: 14.5h) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: P2 > Labels: stale-assigned > Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot > 2020-03-17 at 16.01.16.png > > Time Spent: 14h 40m > Remaining Estimate: 0h > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > h1. Implementation > Notes for obstacles. > h2. BeamModulePlugin's "force" does not take BOM into account (thus fails) > {{forcedModules}} via version resolution strategy is playing bad. This causes > {noformat} > A problem occurred evaluating project ':sdks:java:extensions:sql'. > Could not resolve all dependencies for configuration > ':sdks:java:extensions:sql:fmppTemplates'. > Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version > cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat} > !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! > > h2. :sdks:java:maven-archetypes:examples needs the version of > google-http-client > The task requires the version for the library: > {code:java} > 'google-http-client.version': > dependencies.create(project.library.java.google_http_client).getVersion(), > {code} > This would generate NullPointerException. Running gradlew without the > subproject: > > {code:java} > ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check > {code} > h1. Problem in Gradle-generated pom files > The generated Maven artifact POM has invalid data due to the BOM change. For > example my locally installed > {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}} > had the following problems. > h2. The GCP Libraries BOM showing up in dependencies section: > {noformat} > > > com.google.cloud > libraries-bom > 4.2.0 > compile > > > com.google.guava > guava-jdk5 > ... > > > {noformat} > h2. The artifact that use the BOM in Gradle is missing version in the > dependency. > {noformat} > > com.google.api > gax > > compile > ... > > {noformat} > h1. DependencyManagement section in generated pom.xml > How can I check whether a entry in dependencies is "platform"? > !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10112) Add python sdk state and timer examples to website
[ https://issues.apache.org/jira/browse/BEAM-10112?focusedWorklogId=441654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441654 ] ASF GitHub Bot logged work on BEAM-10112: - Author: ASF GitHub Bot Created on: 05/Jun/20 03:10 Start Date: 05/Jun/20 03:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11882: URL: https://github.com/apache/beam/pull/11882#issuecomment-639231673 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441654) Time Spent: 0.5h (was: 20m) > Add python sdk state and timer examples to website > -- > > Key: BEAM-10112 > URL: https://issues.apache.org/jira/browse/BEAM-10112 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: P2 > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow
[ https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=441651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441651 ] ASF GitHub Bot logged work on BEAM-8451: Author: ASF GitHub Bot Created on: 05/Jun/20 03:07 Start Date: 05/Jun/20 03:07 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11706: URL: https://github.com/apache/beam/pull/11706#issuecomment-639230979 @rosetn - did you have a chance to review this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441651) Time Spent: 2.5h (was: 2h 20m) > Interactive Beam example failing from stack overflow > > > Key: BEAM-8451 > URL: https://issues.apache.org/jira/browse/BEAM-8451 > Project: Beam > Issue Type: Bug > Components: examples-python, runner-py-interactive >Reporter: Igor Durovic >Assignee: Chun Yang >Priority: P2 > Fix For: 2.18.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > > RecursionError: maximum recursion depth exceeded in __instancecheck__ > at > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405] > > This occurred after the execution of the last cell in > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10201) Enhance JsonToRow to add Deadletter Support
Reza ardeshir rokni created BEAM-10201: -- Summary: Enhance JsonToRow to add Deadletter Support Key: BEAM-10201 URL: https://issues.apache.org/jira/browse/BEAM-10201 Project: Beam Issue Type: New Feature Components: sdk-java-core Reporter: Reza ardeshir rokni Current JsonToRow transform does not support Dead Letter pattern on parse failures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects
[ https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reza ardeshir rokni updated BEAM-10199: --- Priority: P3 (was: P2) > AutoValue Schema unable to work with auto-value-gson-annotations objects > > > Key: BEAM-10199 > URL: https://issues.apache.org/jira/browse/BEAM-10199 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.21.0 >Reporter: Reza ardeshir rokni >Priority: P3 > > Objects created with the extension : > com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0 > Fail with NPE in Java pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects
[ https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reza ardeshir rokni updated BEAM-10199: --- Issue Type: Bug (was: New Feature) > AutoValue Schema unable to work with auto-value-gson-annotations objects > > > Key: BEAM-10199 > URL: https://issues.apache.org/jira/browse/BEAM-10199 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.21.0 >Reporter: Reza ardeshir rokni >Priority: P2 > > Objects created with the extension : > com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0 > Fail with NPE in Java pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10200) Improve memory profiling for users of Portable Beam Python
[ https://issues.apache.org/jira/browse/BEAM-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-10200: --- Labels: starter (was: ) > Improve memory profiling for users of Portable Beam Python > -- > > Key: BEAM-10200 > URL: https://issues.apache.org/jira/browse/BEAM-10200 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Priority: P2 > Labels: starter > > We have a Profiler[1] that is integrated with SDK worker[1a], however it only > saves CPU metrics [1b]. > We have a MemoryReporter util[2] which can log heap dumps, however it is not > documented on Beam Website and does not respect the --profile_memory and > --profile_location options[3]. The profile_memory flag currently works only > for Dataflow Runner users who run non-portable batch pipelines; profiles > are saved only if memory usage between samples exceeds 1000G. > We should improve memory profiling experience for Portable Python users and > consider making a guide on how users can investigate OOMing pipelines on Beam > website. > > [1] > https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46 > [1a] > https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157 > [1b] > https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112 > [2] > https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124 > [3] > https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10200) Improve memory profiling for users of Portable Beam Python
Valentyn Tymofieiev created BEAM-10200: -- Summary: Improve memory profiling for users of Portable Beam Python Key: BEAM-10200 URL: https://issues.apache.org/jira/browse/BEAM-10200 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: Valentyn Tymofieiev We have a Profiler[1] that is integrated with SDK worker[1a], however it only saves CPU metrics [1b]. We have a MemoryReporter util[2] which can log heap dumps, however it is not documented on Beam Website and does not respect the --profile_memory and --profile_location options[3]. The profile_memory flag currently works only for Dataflow Runner users who run non-portable batch pipelines; profiles are saved only if memory usage between samples exceeds 1000G. We should improve memory profiling experience for Portable Python users and consider making a guide on how users can investigate OOMing pipelines on Beam website. [1] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46 [1a] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157 [1b] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112 [2] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124 [3] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441624 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 05/Jun/20 00:59 Start Date: 05/Jun/20 00:59 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11927: URL: https://github.com/apache/beam/pull/11927#issuecomment-639196604 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441624) Time Spent: 1h 10m (was: 1h) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.
[ https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=441623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441623 ] ASF GitHub Bot logged work on BEAM-10036: - Author: ASF GitHub Bot Created on: 05/Jun/20 00:56 Start Date: 05/Jun/20 00:56 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11766: URL: https://github.com/apache/beam/pull/11766#discussion_r435632933 ## File path: sdks/python/apache_beam/dataframe/expressions.py ## @@ -85,16 +87,10 @@ def evaluate_at(self, session): # type: (Session) -> T """Returns the result of self with the bindings given in session.""" raise NotImplementedError(type(self)) - def requires_partition_by_index(self): # type: () -> bool -"""Whether this expression requires its argument(s) to be partitioned -by index.""" -# TODO: It might be necessary to support partitioning by part of the index, -# for some args, which would require returning more than a boolean here. + def requires_partition_by(self): # type: () -> Partitioning raise NotImplementedError(type(self)) - def preserves_partition_by_index(self): # type: () -> bool -"""Whether the result of this expression will be partitioned by index -whenever all of its inputs are partitioned by index.""" + def preserves_partition_by(self): # type: () -> Partitioning Review comment: Docstring comments added. ## File path: sdks/python/apache_beam/dataframe/partitionings.py ## @@ -0,0 +1,136 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import + +from typing import Any +from typing import Iterable +from typing import TypeVar + +import pandas as pd + +Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame) + + +class Partitioning(object): + """A class representing a (consistent) partitioning of dataframe objects. + """ + def is_subpartitioning_of(self, other): +# type: (Partitioning) -> bool + +"""Returns whether self is a sub-partition of other. + +Specifically, returns whether something partitioned by self is necissarily +also partitioned by other. +""" +raise NotImplementedError + + def partition_fn(self, df): +# type: (Frame) -> Iterable[Tuple[Any, Frame]] + +"""A callable that actually performs the partitioning of a Frame df. + +This will be invoked via a FlatMap in conjunction with a GroupKey to +achieve the desired partitioning. +""" +raise NotImplementedError + + +class Index(Partitioning): + """A partitioning by index (either fully or partially). + + If the set of "levels" of the index to consider is not specified, the entire + index is used. + + These form a partial order, given by + + Nothing() < Index([i]) < Index([i, j]) < ... < Index() < Singleton() + + The ordering is implemented via the is_subpartitioning_of method, where the + examples on the right are subpartitionings of the examples on the left above. + """ + + _INDEX_PARTITIONS = 10 + + def __init__(self, levels=None): +self._levels = levels + + def __eq__(self, other): +return type(self) == type(other) and self._levels == other._levels + + def __hash__(self): +if self._levels: + return hash(tuple(sorted(self._levels))) +else: + return hash(type(self)) + + def is_subpartitioning_of(self, other): +if isinstance(other, Nothing): + return True +elif isinstance(other, Index): + if self._levels is None: +return True + elif other._levels is None: +return False + else: +return all(level in other._levels for level in self._levels) +else: + return False + + def partition_fn(self, df): +if self._levels is None: + levels = list(range(df.index.nlevels)) +else: + levels = self._levels +hashes = sum( +pd.util.hash_array(df.index.get_level_values(level)) +for level in levels) +for key in range(self._INDEX_PARTITIONS): + yield key, df[hashes % self._INDEX_PARTITIONS == key] + + +class Singleton(Partitioning): + """A partitioning
[jira] [Updated] (BEAM-10061) ReadAllFromTextWithFilename
[ https://issues.apache.org/jira/browse/BEAM-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Ferriero updated BEAM-10061: -- Component/s: (was: io-py-gcp) sdk-java-core io-py-files io-java-files > ReadAllFromTextWithFilename > --- > > Key: BEAM-10061 > URL: https://issues.apache.org/jira/browse/BEAM-10061 > Project: Beam > Issue Type: New Feature > Components: io-java-files, io-py-files, sdk-java-core, sdk-py-core > Environment: Dataflow with Python >Reporter: Ryan Canty >Priority: P2 > > I am trying to create a job that reads from GCS executes some code against > each line and creates a PCollection with the line and the file. So basically > what I'd like is a combination of textio.ReadTextWithFilename and > textio.ReadAllFromText -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10061) ReadAllFromTextWithFilename
[ https://issues.apache.org/jira/browse/BEAM-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126295#comment-17126295 ] Jacob Ferriero commented on BEAM-10061: --- This would also be useful in java sdk as that's what I typically see customers using. (added java components to this issue) > ReadAllFromTextWithFilename > --- > > Key: BEAM-10061 > URL: https://issues.apache.org/jira/browse/BEAM-10061 > Project: Beam > Issue Type: New Feature > Components: io-java-files, io-py-files, sdk-java-core, sdk-py-core > Environment: Dataflow with Python >Reporter: Ryan Canty >Priority: P2 > > I am trying to create a job that reads from GCS executes some code against > each line and creates a PCollection with the line and the file. So basically > what I'd like is a combination of textio.ReadTextWithFilename and > textio.ReadAllFromText -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10184) Build python wheels on GitHub Actions for Linux/MacOS
[ https://issues.apache.org/jira/browse/BEAM-10184?focusedWorklogId=441616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441616 ] ASF GitHub Bot logged work on BEAM-10184: - Author: ASF GitHub Bot Created on: 05/Jun/20 00:42 Start Date: 05/Jun/20 00:42 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #11877: URL: https://github.com/apache/beam/pull/11877#discussion_r435631141 ## File path: .github/workflows/build_wheels.yml ## @@ -0,0 +1,141 @@ +name: Build python wheels + +on: + push: +branches: + - master + - release-* +tags: + - v* + +jobs: + + build_source: +runs-on: ubuntu-18.04 +steps: + - name: Checkout code +uses: actions/checkout@v2 + - name: Install python +uses: actions/setup-python@v2 +with: + python-version: 3.7 + - name: Get build dependencies +working-directory: ./sdks/python +run: python3 -m pip install cython && python3 -m pip install -r build-requirements.txt + - name: Install wheels +run: python3 -m pip install wheel + - name: Buld source +working-directory: ./sdks/python +run: python3 setup.py sdist --formats=gztar,zip + - name: Unzip source +working-directory: ./sdks/python +run: unzip dist/$(ls dist | grep .zip | head -n 1) + - name: Rename source directory +working-directory: ./sdks/python +run: mv $(ls | grep apache-beam) apache-beam-source + - name: Upload source +uses: actions/upload-artifact@v2 +with: + name: source + path: sdks/python/apache-beam-source + - name: Upload compressed sources +uses: actions/upload-artifact@v2 +with: + name: source_gztar_zip + path: sdks/python/dist + + prepare_gcs: +name: Prepare GCS +needs: build_source +runs-on: ubuntu-18.04 +steps: + - name: Authenticate on GCP +uses: GoogleCloudPlatform/github-actions/setup-gcloud@master +with: + service_account_email: ${{ secrets.CCP_SA_EMAIL }} + service_account_key: ${{ secrets.CCP_SA_KEY }} + - name: Remove existing files on GCS bucket +run: gsutil rm -r "gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/" || true + + upload_source_to_gcs: +name: Upload source to GCS bucket +needs: prepare_gcs +runs-on: ubuntu-18.04 +steps: + - name: Download wheels +uses: actions/download-artifact@v2 +with: + name: source_gztar_zip + path: source/ + - name: Authenticate on GCP +uses: GoogleCloudPlatform/github-actions/setup-gcloud@master +with: + service_account_email: ${{ secrets.CCP_SA_EMAIL }} + service_account_key: ${{ secrets.CCP_SA_KEY }} + - name: Copy sources to GCS bucket +run: gsutil cp -r -a public-read source/* gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/ + - name: List sources on GCS bucket +run: | + gsutil ls "gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/*.tar.gz" + gsutil ls "gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/*.zip" + + build_wheels: +name: Build wheels on ${{ matrix.os }} +needs: prepare_gcs +runs-on: ${{ matrix.os }} +strategy: + matrix: +os : [ubuntu-18.04, macos-10.15] Review comment: Perfect. (just to clarify my understanding: we will have a single github ci config, this is correct, right?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441616) Time Spent: 3h 20m (was: 3h 10m) > Build python wheels on GitHub Actions for Linux/MacOS > - > > Key: BEAM-10184 > URL: https://issues.apache.org/jira/browse/BEAM-10184 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Tobiasz Kedzierski >Priority: P2 > Labels: build, python, python-packages, python-wheel > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10184) Build python wheels on GitHub Actions for Linux/MacOS
[ https://issues.apache.org/jira/browse/BEAM-10184?focusedWorklogId=441615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441615 ] ASF GitHub Bot logged work on BEAM-10184: - Author: ASF GitHub Bot Created on: 05/Jun/20 00:40 Start Date: 05/Jun/20 00:40 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #11877: URL: https://github.com/apache/beam/pull/11877#discussion_r435630823 ## File path: .github/workflows/build_wheels.yml ## @@ -0,0 +1,141 @@ +name: Build python wheels + +on: + push: +branches: + - master + - release-* +tags: + - v* + +jobs: + + build_source: +runs-on: ubuntu-18.04 +steps: + - name: Checkout code +uses: actions/checkout@v2 + - name: Install python +uses: actions/setup-python@v2 +with: + python-version: 3.7 + - name: Get build dependencies +working-directory: ./sdks/python +run: python3 -m pip install cython && python3 -m pip install -r build-requirements.txt + - name: Install wheels +run: python3 -m pip install wheel + - name: Buld source +working-directory: ./sdks/python +run: python3 setup.py sdist --formats=gztar,zip + - name: Unzip source +working-directory: ./sdks/python +run: unzip dist/$(ls dist | grep .zip | head -n 1) + - name: Rename source directory +working-directory: ./sdks/python +run: mv $(ls | grep apache-beam) apache-beam-source + - name: Upload source +uses: actions/upload-artifact@v2 +with: + name: source + path: sdks/python/apache-beam-source + - name: Upload compressed sources +uses: actions/upload-artifact@v2 +with: + name: source_gztar_zip Review comment: Very nice. 2 follow up questions: 1. Can we add a very last stage that can run: gsutil ls "gs://***/$***GITHUB_REF##*/***/*" -> so that we can get the whole output of the gcs folder all at once. (btw, do we have a mechanism to clean up these gcs locations?) 2. What is the difference between "Build python wheels / Build wheels on ..." job executing "Upload wheels" step and "Upload wheels to GCS bucket ..." job? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441615) Time Spent: 3h 10m (was: 3h) > Build python wheels on GitHub Actions for Linux/MacOS > - > > Key: BEAM-10184 > URL: https://issues.apache.org/jira/browse/BEAM-10184 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Tobiasz Kedzierski >Priority: P2 > Labels: build, python, python-packages, python-wheel > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?focusedWorklogId=441613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441613 ] ASF GitHub Bot logged work on BEAM-3788: Author: ASF GitHub Bot Created on: 05/Jun/20 00:35 Start Date: 05/Jun/20 00:35 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11928: URL: https://github.com/apache/beam/pull/11928#issuecomment-639190896 R: @robertwb CC: @mxm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441613) Time Spent: 20m (was: 10m) > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10184) Build python wheels on GitHub Actions for Linux/MacOS
[ https://issues.apache.org/jira/browse/BEAM-10184?focusedWorklogId=441610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441610 ] ASF GitHub Bot logged work on BEAM-10184: - Author: ASF GitHub Bot Created on: 05/Jun/20 00:34 Start Date: 05/Jun/20 00:34 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11877: URL: https://github.com/apache/beam/pull/11877#issuecomment-639190586 > > How can I preview the action on the fork: [TobKed#3](https://github.com/TobKed/beam/pull/3) ? > > @aaltay https://github.com/TobKed/beam/actions?query=branch%3Agithub-actions-build-wheels-linux-macos Super nice! I have not reviewed the steps, I think PR is better for doing the most initial review. We can do a final quick review of this once we settle on this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441610) Time Spent: 3h (was: 2h 50m) > Build python wheels on GitHub Actions for Linux/MacOS > - > > Key: BEAM-10184 > URL: https://issues.apache.org/jira/browse/BEAM-10184 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Tobiasz Kedzierski >Priority: P2 > Labels: build, python, python-packages, python-wheel > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?focusedWorklogId=441609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441609 ] ASF GitHub Bot logged work on BEAM-3788: Author: ASF GitHub Bot Created on: 05/Jun/20 00:33 Start Date: 05/Jun/20 00:33 Worklog Time Spent: 10m Work Description: chamikaramj opened a new pull request #11928: URL: https://github.com/apache/beam/pull/11928 Also removes it from in-progress IO connectors list now that it's supported by portable runners as well as Dataflow. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441599 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 05/Jun/20 00:10 Start Date: 05/Jun/20 00:10 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11927: URL: https://github.com/apache/beam/pull/11927#issuecomment-639183557 R: @tysonjh cc: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441599) Time Spent: 1h (was: 50m) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 1h > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441598 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 05/Jun/20 00:09 Start Date: 05/Jun/20 00:09 Worklog Time Spent: 10m Work Description: lostluck opened a new pull request #11927: URL: https://github.com/apache/beam/pull/11927 I noticed that there were several stranglers that were not following the import name conventions, and this PR completes that work. pipepb, fnpb, jobpb for the Pipeline protos, Fn Execution protos, and Job Management protos respectively, and v1pb for the internal serialization representation for types the Go SDK uses. This PR is a rote find/replace. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build
[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant
[ https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441597 ] ASF GitHub Bot logged work on BEAM-10093: - Author: ASF GitHub Bot Created on: 05/Jun/20 00:07 Start Date: 05/Jun/20 00:07 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #11820: URL: https://github.com/apache/beam/pull/11820#discussion_r435622797 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.nexmark.queries.zetasql; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.sql.SqlTransform; +import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.nexmark.model.Bid; +import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.nexmark.model.Event.Type; +import org.apache.beam.sdk.nexmark.model.sql.SelectEvent; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** + * Query 0: Pass events through unchanged. + * + * This measures the overhead of the Beam ZetaSql implementation and test harness like conversion + * from Java model classes to Beam records. + * + * {@link Bid} events are used here at the moment, ås they are most numerous with default + * configuration. + */ +public class ZetaSqlQuery0 extends NexmarkQueryTransform { + + public ZetaSqlQuery0() { +super("ZetaSqlQuery0"); + } + + @Override + public PCollection expand(PCollection allEvents) { +PCollection rows = +allEvents +.apply(Filter.by(NexmarkQueryUtil.IS_BID)) +.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID)); + +return rows.apply(getName() + ".Serialize", logBytesMetric(rows.getCoder())) +.setRowSchema(rows.getSchema()) +.apply( +SqlTransform.query("SELECT * FROM PCOLLECTION") +.withQueryPlannerClass(ZetaSQLQueryPlanner.class)) Review comment: OK that was a total misadventure. Going to back to just totally coupling the SQL dialect suites. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441597) Time Spent: 2h (was: 1h 50m) > Add ZetaSQL Nexmark variant > --- > > Key: BEAM-10093 > URL: https://issues.apache.org/jira/browse/BEAM-10093 > Project: Beam > Issue Type: New Feature > Components: testing-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: P2 > Time Spent: 2h > Remaining Estimate: 0h > > Most queries will be identical, but best to simply stay decoupled, so this is > a copy/paste/modify job. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-9615: --- Labels: (was: stale-assigned) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126280#comment-17126280 ] Robert Burke commented on BEAM-9615: State of the world slowed down progress on this, but I'm now rolling out PRs for review. > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects
[ https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126261#comment-17126261 ] Reza ardeshir rokni commented on BEAM-10199: FYI [~reuvenlax], > AutoValue Schema unable to work with auto-value-gson-annotations objects > > > Key: BEAM-10199 > URL: https://issues.apache.org/jira/browse/BEAM-10199 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Affects Versions: 2.21.0 >Reporter: Reza ardeshir rokni >Priority: P2 > > Objects created with the extension : > com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0 > Fail with NPE in Java pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects
[ https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reza ardeshir rokni updated BEAM-10199: --- Status: Open (was: Triage Needed) > AutoValue Schema unable to work with auto-value-gson-annotations objects > > > Key: BEAM-10199 > URL: https://issues.apache.org/jira/browse/BEAM-10199 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Affects Versions: 2.21.0 >Reporter: Reza ardeshir rokni >Priority: P2 > > Objects created with the extension : > com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0 > Fail with NPE in Java pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects
[ https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126260#comment-17126260 ] Reza ardeshir rokni commented on BEAM-10199: The GSON autovalue object actually extends a base object, the base object looks like the normal AutoValue generated object... >From ErrorMsg class we get: abstract class $AutoValue_ErrorMsg extends ErrorMsg And then: final class AutoValue_ErrorMsg extends $AutoValue_ErrorMsg So this line of code in the Beam utils, wont work...: return baseName.startsWith("AutoValue_") ? clazz.getSuperclass() : clazz; > AutoValue Schema unable to work with auto-value-gson-annotations objects > > > Key: BEAM-10199 > URL: https://issues.apache.org/jira/browse/BEAM-10199 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Affects Versions: 2.21.0 >Reporter: Reza ardeshir rokni >Priority: P2 > > Objects created with the extension : > com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0 > Fail with NPE in Java pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects
Reza ardeshir rokni created BEAM-10199: -- Summary: AutoValue Schema unable to work with auto-value-gson-annotations objects Key: BEAM-10199 URL: https://issues.apache.org/jira/browse/BEAM-10199 Project: Beam Issue Type: New Feature Components: sdk-java-core Affects Versions: 2.21.0 Reporter: Reza ardeshir rokni Objects created with the extension : com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0 Fail with NPE in Java pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10189) Add ValueState to python sdk
[ https://issues.apache.org/jira/browse/BEAM-10189?focusedWorklogId=441570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441570 ] ASF GitHub Bot logged work on BEAM-10189: - Author: ASF GitHub Bot Created on: 04/Jun/20 22:50 Start Date: 04/Jun/20 22:50 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11916: URL: https://github.com/apache/beam/pull/11916#issuecomment-639157754 Yes, the plan was to consider changing Java too, though that's harder due to backwards compatibility issues. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441570) Time Spent: 50m (was: 40m) > Add ValueState to python sdk > > > Key: BEAM-10189 > URL: https://issues.apache.org/jira/browse/BEAM-10189 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant
[ https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441580 ] ASF GitHub Bot logged work on BEAM-10093: - Author: ASF GitHub Bot Created on: 04/Jun/20 23:02 Start Date: 04/Jun/20 23:02 Worklog Time Spent: 10m Work Description: apilloud commented on a change in pull request #11820: URL: https://github.com/apache/beam/pull/11820#discussion_r435598706 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.nexmark.queries.zetasql; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.sql.SqlTransform; +import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.nexmark.model.Bid; +import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.nexmark.model.Event.Type; +import org.apache.beam.sdk.nexmark.model.sql.SelectEvent; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** + * Query 0: Pass events through unchanged. + * + * This measures the overhead of the Beam ZetaSql implementation and test harness like conversion + * from Java model classes to Beam records. + * + * {@link Bid} events are used here at the moment, ås they are most numerous with default + * configuration. + */ +public class ZetaSqlQuery0 extends NexmarkQueryTransform { + + public ZetaSqlQuery0() { +super("ZetaSqlQuery0"); + } + + @Override + public PCollection expand(PCollection allEvents) { +PCollection rows = +allEvents +.apply(Filter.by(NexmarkQueryUtil.IS_BID)) +.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID)); + +return rows.apply(getName() + ".Serialize", logBytesMetric(rows.getCoder())) +.setRowSchema(rows.getSchema()) +.apply( +SqlTransform.query("SELECT * FROM PCOLLECTION") +.withQueryPlannerClass(ZetaSQLQueryPlanner.class)) Review comment: That sounds good to me. Then we can overload the methods that need customization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441580) Time Spent: 1h 50m (was: 1h 40m) > Add ZetaSQL Nexmark variant > --- > > Key: BEAM-10093 > URL: https://issues.apache.org/jira/browse/BEAM-10093 > Project: Beam > Issue Type: New Feature > Components: testing-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: P2 > Time Spent: 1h 50m > Remaining Estimate: 0h > > Most queries will be identical, but best to simply stay decoupled, so this is > a copy/paste/modify job. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant
[ https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441567 ] ASF GitHub Bot logged work on BEAM-10093: - Author: ASF GitHub Bot Created on: 04/Jun/20 22:48 Start Date: 04/Jun/20 22:48 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #11820: URL: https://github.com/apache/beam/pull/11820#discussion_r435594096 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.nexmark.queries.zetasql; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.sql.SqlTransform; +import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.nexmark.model.Bid; +import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.nexmark.model.Event.Type; +import org.apache.beam.sdk.nexmark.model.sql.SelectEvent; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** + * Query 0: Pass events through unchanged. + * + * This measures the overhead of the Beam ZetaSql implementation and test harness like conversion + * from Java model classes to Beam records. + * + * {@link Bid} events are used here at the moment, ås they are most numerous with default + * configuration. + */ +public class ZetaSqlQuery0 extends NexmarkQueryTransform { + + public ZetaSqlQuery0() { +super("ZetaSqlQuery0"); + } + + @Override + public PCollection expand(PCollection allEvents) { +PCollection rows = +allEvents +.apply(Filter.by(NexmarkQueryUtil.IS_BID)) +.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID)); + +return rows.apply(getName() + ".Serialize", logBytesMetric(rows.getCoder())) +.setRowSchema(rows.getSchema()) +.apply( +SqlTransform.query("SELECT * FROM PCOLLECTION") +.withQueryPlannerClass(ZetaSQLQueryPlanner.class)) Review comment: Yea fair enough. When I started, I did not know how much things would differ. It was suggested that the dialects would not really be compatible, but actually they are. I stopped short of a real refactor but I'll go ahead with something now that it is working. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441567) Time Spent: 1h 20m (was: 1h 10m) > Add ZetaSQL Nexmark variant > --- > > Key: BEAM-10093 > URL: https://issues.apache.org/jira/browse/BEAM-10093 > Project: Beam > Issue Type: New Feature > Components: testing-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: P2 > Time Spent: 1h 20m > Remaining Estimate: 0h > > Most queries will be identical, but best to simply stay decoupled, so this is > a copy/paste/modify job. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7923) Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=441578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441578 ] ASF GitHub Bot logged work on BEAM-7923: Author: ASF GitHub Bot Created on: 04/Jun/20 23:01 Start Date: 04/Jun/20 23:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11884: URL: https://github.com/apache/beam/pull/11884#issuecomment-639161004 There's a lot of files there that don't seem relevant; I think we should go through and figure out what's needed for the actual plugin vs. what's "extras" that just got copied from a template. We also need to figure out the distribution story. Will this be released with beam? As another pypi pacakge (and npm package)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441578) Time Spent: 23h 40m (was: 23.5h) > Interactive Beam > > > Key: BEAM-7923 > URL: https://issues.apache.org/jira/browse/BEAM-7923 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: P2 > Labels: stale-assigned > Time Spent: 23h 40m > Remaining Estimate: 0h > > This is the top level ticket for all efforts leveraging [interactive > Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]] > As the development goes, blocking tickets will be added to this one. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant
[ https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441577 ] ASF GitHub Bot logged work on BEAM-10093: - Author: ASF GitHub Bot Created on: 04/Jun/20 22:58 Start Date: 04/Jun/20 22:58 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #11820: URL: https://github.com/apache/beam/pull/11820#discussion_r435597049 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.nexmark.queries.zetasql; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.sql.SqlTransform; +import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.nexmark.model.Bid; +import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.nexmark.model.Event.Type; +import org.apache.beam.sdk.nexmark.model.sql.SelectEvent; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** + * Query 0: Pass events through unchanged. + * + * This measures the overhead of the Beam ZetaSql implementation and test harness like conversion + * from Java model classes to Beam records. + * + * {@link Bid} events are used here at the moment, ås they are most numerous with default + * configuration. + */ +public class ZetaSqlQuery0 extends NexmarkQueryTransform { + + public ZetaSqlQuery0() { +super("ZetaSqlQuery0"); + } + + @Override + public PCollection expand(PCollection allEvents) { +PCollection rows = +allEvents +.apply(Filter.by(NexmarkQueryUtil.IS_BID)) +.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID)); + +return rows.apply(getName() + ".Serialize", logBytesMetric(rows.getCoder())) +.setRowSchema(rows.getSchema()) +.apply( +SqlTransform.query("SELECT * FROM PCOLLECTION") +.withQueryPlannerClass(ZetaSQLQueryPlanner.class)) Review comment: In my mental model, there are _n_ Nexmark queries with common setup and teardown, and then there are _m_ suites that implement the queries, filling in the middle part. I do want to keep them slightly decoupled in case some of them need hackery. They turned out not to require tweaking, but I'm not super convinced that isn't coincidence. So the refactor I have in mind is to provide the scaffolding for each query but still having a pure java, Calcite, and ZetaSQL suite of classes. The Calcite and ZetaSQL bits would share a lot and/or be one liners. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441577) Time Spent: 1h 40m (was: 1.5h) > Add ZetaSQL Nexmark variant > --- > > Key: BEAM-10093 > URL: https://issues.apache.org/jira/browse/BEAM-10093 > Project: Beam > Issue Type: New Feature > Components: testing-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: P2 >
[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant
[ https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441576 ] ASF GitHub Bot logged work on BEAM-10093: - Author: ASF GitHub Bot Created on: 04/Jun/20 22:57 Start Date: 04/Jun/20 22:57 Worklog Time Spent: 10m Work Description: kennknowles commented on a change in pull request #11820: URL: https://github.com/apache/beam/pull/11820#discussion_r435597049 ## File path: sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.nexmark.queries.zetasql; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.sql.SqlTransform; +import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner; +import org.apache.beam.sdk.metrics.Counter; +import org.apache.beam.sdk.metrics.Metrics; +import org.apache.beam.sdk.nexmark.model.Bid; +import org.apache.beam.sdk.nexmark.model.Event; +import org.apache.beam.sdk.nexmark.model.Event.Type; +import org.apache.beam.sdk.nexmark.model.sql.SelectEvent; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform; +import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil; +import org.apache.beam.sdk.schemas.transforms.Convert; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.Filter; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; + +/** + * Query 0: Pass events through unchanged. + * + * This measures the overhead of the Beam ZetaSql implementation and test harness like conversion + * from Java model classes to Beam records. + * + * {@link Bid} events are used here at the moment, ås they are most numerous with default + * configuration. + */ +public class ZetaSqlQuery0 extends NexmarkQueryTransform { + + public ZetaSqlQuery0() { +super("ZetaSqlQuery0"); + } + + @Override + public PCollection expand(PCollection allEvents) { +PCollection rows = +allEvents +.apply(Filter.by(NexmarkQueryUtil.IS_BID)) +.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID)); + +return rows.apply(getName() + ".Serialize", logBytesMetric(rows.getCoder())) +.setRowSchema(rows.getSchema()) +.apply( +SqlTransform.query("SELECT * FROM PCOLLECTION") +.withQueryPlannerClass(ZetaSQLQueryPlanner.class)) Review comment: In my mental model, there are Nexmark queries with common setup and teardown, and then there are suites that do some processing on them. I do want to keep them slightly decoupled in case some of them need hackery. They turned out not to require tweaking, but I'm not super convinced that isn't coincidence. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441576) Time Spent: 1.5h (was: 1h 20m) > Add ZetaSQL Nexmark variant > --- > > Key: BEAM-10093 > URL: https://issues.apache.org/jira/browse/BEAM-10093 > Project: Beam > Issue Type: New Feature > Components: testing-nexmark >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > Most queries will be identical, but best to simply stay decoupled, so this is > a copy/paste/modify job. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7074) FnApiRunner fails to wire multiple timer specs in single pardo
[ https://issues.apache.org/jira/browse/BEAM-7074?focusedWorklogId=441572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441572 ] ASF GitHub Bot logged work on BEAM-7074: Author: ASF GitHub Bot Created on: 04/Jun/20 22:52 Start Date: 04/Jun/20 22:52 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11894: URL: https://github.com/apache/beam/pull/11894#discussion_r435595283 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py ## @@ -377,11 +377,6 @@ def process_timer( assert_that(actual, equal_to(expected)) def test_pardo_timers_clear(self): -if type(self).__name__ != 'FlinkRunnerTest': Review comment: I think it would be cleaner to add an override in the subclasses than test on `__name__` here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441572) Time Spent: 1h 50m (was: 1h 40m) > FnApiRunner fails to wire multiple timer specs in single pardo > -- > > Key: BEAM-7074 > URL: https://issues.apache.org/jira/browse/BEAM-7074 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.11.0 >Reporter: Thomas Weise >Priority: P2 > Labels: stale-P2 > Fix For: 2.21.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Multiple timer specs in a ParDo yield "NotImplementedError: Timers and side > inputs." -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10198) IOException for distinct bytes SQL query
[ https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-10198: Labels: beam-fixit (was: ) > IOException for distinct bytes SQL query > > > Key: BEAM-10198 > URL: https://issues.apache.org/jira/browse/BEAM-10198 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: P2 > Labels: beam-fixit > > This query breaks Java code import. Need to diagnose the root cause and have > a fix properly. > {code:java} > public void test_distinct_bytes() { > String sql = "SELECT DISTINCT val.BYTES\n" > + "from (select b\"1\" BYTES union all\n" > + " select cast(NULL as bytes) union all\n" > + " select b\"-1\" union all\n" > + " select b\"1\" union all\n" > + " select cast(NULL as bytes)) val"; > ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); > BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); > PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, > beamRelNode); > Schema singleField = Schema.builder().addByteArrayField("field1").build(); > PAssert.that(stream) > .containsInAnyOrder( > > Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); > > pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); > } > {code} > This query will throws > {code:java} > Forbidden IOException when reading from InputStream > java.lang.IllegalArgumentException: Forbidden IOException when reading from > InputStream > at > org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118) > at > org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98) > at > org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92) > at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10198) IOException for distinct bytes SQL query
[ https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-10198: Description: This query breaks Java code import. Need to diagnose the root cause and have a fix properly. {code:java} public void test_distinct_bytes() { String sql = "SELECT DISTINCT val.BYTES\n" + "from (select b\"1\" BYTES union all\n" + " select cast(NULL as bytes) union all\n" + " select b\"-1\" union all\n" + " select b\"1\" union all\n" + " select cast(NULL as bytes)) val"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); Schema singleField = Schema.builder().addByteArrayField("field1").build(); PAssert.that(stream) .containsInAnyOrder( Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } {code} This query will throws {code:java} Forbidden IOException when reading from InputStream java.lang.IllegalArgumentException: Forbidden IOException when reading from InputStream at org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118) at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98) at org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92) at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141) {code} was: {code:java} public void test_distinct_bytes() { String sql = "SELECT DISTINCT val.BYTES\n" + "from (select b\"1\" BYTES union all\n" + " select cast(NULL as bytes) union all\n" + " select b\"-1\" union all\n" + " select b\"1\" union all\n" + " select cast(NULL as bytes)) val"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); Schema singleField = Schema.builder().addByteArrayField("field1").build(); PAssert.that(stream) .containsInAnyOrder( Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } {code} This query breaks Java code import. Need to diagnose the root cause and have a fix properly. > IOException for distinct bytes SQL query > > > Key: BEAM-10198 > URL: https://issues.apache.org/jira/browse/BEAM-10198 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: P2 > > This query breaks Java code import. Need to diagnose the root cause and have > a fix properly. > {code:java} > public void test_distinct_bytes() { > String sql = "SELECT DISTINCT val.BYTES\n" > + "from (select b\"1\" BYTES union all\n" > + " select cast(NULL as bytes) union all\n" > + " select b\"-1\" union all\n" > + " select b\"1\" union all\n" > + " select cast(NULL as bytes)) val"; > ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); > BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); > PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, > beamRelNode); > Schema singleField = Schema.builder().addByteArrayField("field1").build(); > PAssert.that(stream) > .containsInAnyOrder( > > Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); > > pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); > } > {code} > This query will throws > {code:java} > Forbidden IOException when reading from InputStream > java.lang.IllegalArgumentException: Forbidden IOException when reading from > InputStream > at > org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118) > at > org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98) > at > org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92) > at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10198) Program crashes for distinct bytes SQL query
[ https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-10198: --- Status: Open (was: Triage Needed) > Program crashes for distinct bytes SQL query > > > Key: BEAM-10198 > URL: https://issues.apache.org/jira/browse/BEAM-10198 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: P2 > > {code:java} > public void test_distinct_bytes() { > String sql = "SELECT DISTINCT val.BYTES\n" > + "from (select b\"1\" BYTES union all\n" > + " select cast(NULL as bytes) union all\n" > + " select b\"-1\" union all\n" > + " select b\"1\" union all\n" > + " select cast(NULL as bytes)) val"; > ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); > BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); > PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, > beamRelNode); > Schema singleField = Schema.builder().addByteArrayField("field1").build(); > PAssert.that(stream) > .containsInAnyOrder( > > Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); > > pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); > } > {code} > This query breaks Java code import. Need to diagnose the root cause and have > a fix properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10198) Program crashes for distinct bytes SQL query
Rui Wang created BEAM-10198: --- Summary: Program crashes for distinct bytes SQL query Key: BEAM-10198 URL: https://issues.apache.org/jira/browse/BEAM-10198 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Rui Wang Assignee: Rui Wang {code:java} public void test_distinct_bytes() { String sql = "SELECT DISTINCT val.BYTES\n" + "from (select b\"1\" BYTES union all\n" + " select cast(NULL as bytes) union all\n" + " select b\"-1\" union all\n" + " select b\"1\" union all\n" + " select cast(NULL as bytes)) val"; ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, beamRelNode); Schema singleField = Schema.builder().addByteArrayField("field1").build(); PAssert.that(stream) .containsInAnyOrder( Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); } {code} This query breaks Java code import. Need to diagnose the root cause and have a fix properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10198) IOException for distinct bytes SQL query
[ https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-10198: Summary: IOException for distinct bytes SQL query (was: Program crashes for distinct bytes SQL query) > IOException for distinct bytes SQL query > > > Key: BEAM-10198 > URL: https://issues.apache.org/jira/browse/BEAM-10198 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: P2 > > {code:java} > public void test_distinct_bytes() { > String sql = "SELECT DISTINCT val.BYTES\n" > + "from (select b\"1\" BYTES union all\n" > + " select cast(NULL as bytes) union all\n" > + " select b\"-1\" union all\n" > + " select b\"1\" union all\n" > + " select cast(NULL as bytes)) val"; > ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); > BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql); > PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, > beamRelNode); > Schema singleField = Schema.builder().addByteArrayField("field1").build(); > PAssert.that(stream) > .containsInAnyOrder( > > Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build()); > > pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES)); > } > {code} > This query breaks Java code import. Need to diagnose the root cause and have > a fix properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10197) Support type hints for frozenset
Saavan Nanavati created BEAM-10197: -- Summary: Support type hints for frozenset Key: BEAM-10197 URL: https://issues.apache.org/jira/browse/BEAM-10197 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Saavan Nanavati Beam's internal typing system currently supports type hints for set but not frozenset. This Jira ticket will add type annotation support for both frozenset and typing.FrozenSet. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9953) Beam ZetaSQL supports multiple statements in a query
[ https://issues.apache.org/jira/browse/BEAM-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126232#comment-17126232 ] Rui Wang commented on BEAM-9953: Better to check the open sourced Java Analyzer: https://github.com/google/zetasql/blob/master/java/com/google/zetasql/Analyzer.java I am not seeing a API that returns a ResolvedStatment but that is not really analyzing. Also Kyle added an API discussed above (which we need) at: https://github.com/google/zetasql/blob/master/java/com/google/zetasql/Analyzer.java#L214. So it's no longer a problem > Beam ZetaSQL supports multiple statements in a query > > > Key: BEAM-9953 > URL: https://issues.apache.org/jira/browse/BEAM-9953 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kyle Weaver >Priority: P2 > > One example of multiple statements query: > {code:java} > CREATE FUNCTION fun_a (param_1 INT64); SELECT fun_a(10); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9953) Beam ZetaSQL supports multiple statements in a query
[ https://issues.apache.org/jira/browse/BEAM-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126229#comment-17126229 ] Kenneth Knowles commented on BEAM-9953: --- Is there a parse-by-don't-analyze that makes it easy to identify the statement type? (obviously internal to the library, there is) > Beam ZetaSQL supports multiple statements in a query > > > Key: BEAM-9953 > URL: https://issues.apache.org/jira/browse/BEAM-9953 > Project: Beam > Issue Type: Task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kyle Weaver >Priority: P2 > > One example of multiple statements query: > {code:java} > CREATE FUNCTION fun_a (param_1 INT64); SELECT fun_a(10); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition
[ https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-8828: --- Assignee: Scott Lukas > BigQueryTableProvider should allow configuration of write disposition > - > > Key: BEAM-8828 > URL: https://issues.apache.org/jira/browse/BEAM-8828 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Scott Lukas >Priority: P2 > > It should be possible to set BigQueryIO's > [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125] > in a Beam SQL big query table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441527 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 04/Jun/20 21:20 Start Date: 04/Jun/20 21:20 Worklog Time Spent: 10m Work Description: lostluck opened a new pull request #11926: URL: https://github.com/apache/beam/pull/11926 Adds a unit test for the "user side" type encoders /decoders. This will help avoiding having a broken state as we change the string coder, and default custom type coders. Also adjusts the coder print out a little for improved readability (and reduced redundancy). Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441529 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 04/Jun/20 21:20 Start Date: 04/Jun/20 21:20 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11926: URL: https://github.com/apache/beam/pull/11926#issuecomment-639122665 R: @tysonjh cc: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441529) Time Spent: 40m (was: 0.5h) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Labels: stale-assigned > Time Spent: 40m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441523 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 04/Jun/20 21:13 Start Date: 04/Jun/20 21:13 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11925: URL: https://github.com/apache/beam/pull/11925#issuecomment-639119430 R: @tysonjh cc: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441523) Time Spent: 20m (was: 10m) > [Go SDK] Beam Schemas > - > > Key: BEAM-9615 > URL: https://issues.apache.org/jira/browse/BEAM-9615 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Assignee: Robert Burke >Priority: P2 > Labels: stale-assigned > Time Spent: 20m > Remaining Estimate: 0h > > Schema support is required for advanced cross language features in Beam, and > has the opportunity to replace the current default JSON encoding of elements. > Some quick notes, though a better fleshed out doc with details will be > forthcoming: > * All base coders should be implemented, and listed as coder capabilities. I > think only stringutf8 is missing presently. > * Should support fairly arbitrary user types, seamlessly. That is, users > should be able to rely on it "just working" if their type is compatible. > * Should support schema metadata tagging. > In particular, one breaking shift in the default will be to explicitly fail > pipelines if elements have unexported fields, when no other custom coder has > been added. This has been a source of errors/dropped data/keys and a simply > warning at construction time won't cut it. However, we could provide a manual > "use beam schemas, but ignore unexported fields" registration as a work > around. > Edit: Doc is now at https://s.apache.org/beam-go-schemas -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas
[ https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441522 ] ASF GitHub Bot logged work on BEAM-9615: Author: ASF GitHub Bot Created on: 04/Jun/20 21:13 Start Date: 04/Jun/20 21:13 Worklog Time Spent: 10m Work Description: lostluck opened a new pull request #11925: URL: https://github.com/apache/beam/pull/11925 This adds initial utility functions for encoding and decoding utf8 strings in the Go SDK. Doesn't make use of them yet. In practice this is already how strings are encoded in the Go SDK, but marked as "custom" coders rather than the built in URN. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441518 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 04/Jun/20 21:02 Start Date: 04/Jun/20 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on a change in pull request #11922: URL: https://github.com/apache/beam/pull/11922#discussion_r435546832 ## File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java ## @@ -1593,6 +1601,391 @@ public void testProcessElementForSizedElementAndRestriction() throws Exception { assertEquals(stateData, fakeClient.getData()); } + @Test + public void testProcessElementForWindowedSizedElementAndRestriction() throws Exception { +Pipeline p = Pipeline.create(); +PCollection valuePCollection = p.apply(Create.of("unused")); +PCollectionView singletonSideInputView = valuePCollection.apply(View.asSingleton()); +TestSplittableDoFn doFn = new TestSplittableDoFn(singletonSideInputView); + +valuePCollection +.apply(Window.into(SlidingWindows.of(Duration.standardSeconds(1 +.apply(TEST_TRANSFORM_ID, ParDo.of(doFn).withSideInputs(singletonSideInputView)); + +RunnerApi.Pipeline pProto = +ProtoOverrides.updateTransform( +PTransformTranslation.PAR_DO_TRANSFORM_URN, +PipelineTranslation.toProto(p, SdkComponents.create(p.getOptions()), true), +SplittableParDoExpander.createSizedReplacement()); +String expandedTransformId = +Iterables.find( +pProto.getComponents().getTransformsMap().entrySet(), +entry -> +entry +.getValue() +.getSpec() +.getUrn() +.equals( +PTransformTranslation + .SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN) +&& entry.getValue().getUniqueName().contains(TEST_TRANSFORM_ID)) +.getKey(); +RunnerApi.PTransform pTransform = +pProto.getComponents().getTransformsOrThrow(expandedTransformId); +String inputPCollectionId = + pTransform.getInputsOrThrow(ParDoTranslation.getMainInputName(pTransform)); +RunnerApi.PCollection inputPCollection = +pProto.getComponents().getPcollectionsOrThrow(inputPCollectionId); +RehydratedComponents rehydratedComponents = +RehydratedComponents.forComponents(pProto.getComponents()); +Coder inputCoder = +WindowedValue.getFullCoder( +CoderTranslation.fromProto( + pProto.getComponents().getCodersOrThrow(inputPCollection.getCoderId()), +rehydratedComponents, +TranslationContext.DEFAULT), +(Coder) +CoderTranslation.fromProto( +pProto +.getComponents() +.getCodersOrThrow( +pProto +.getComponents() +.getWindowingStrategiesOrThrow( +inputPCollection.getWindowingStrategyId()) +.getWindowCoderId()), +rehydratedComponents, +TranslationContext.DEFAULT)); +String outputPCollectionId = pTransform.getOutputsOrThrow("output"); + +ImmutableMap stateData = +ImmutableMap.of( + multimapSideInputKey(singletonSideInputView.getTagInternal().getId(), ByteString.EMPTY), +encode("8")); + +FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData); + +List> mainOutputValues = new ArrayList<>(); +MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); +PCollectionConsumerRegistry consumers = +new PCollectionConsumerRegistry( +metricsContainerRegistry, mock(ExecutionStateTracker.class)); +consumers.register( +outputPCollectionId, +TEST_TRANSFORM_ID, +(FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); +PTransformFunctionRegistry startFunctionRegistry = +new PTransformFunctionRegistry( +mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); +PTransformFunctionRegistry finishFunctionRegistry = +new PTransformFunctionRegistry( +mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); +List teardownFunctions = new ArrayList<>(); +List progressRequestCallbacks = new ArrayList<>(); +BundleSplitListener.InMemory splitListener = BundleSplitListener.InMemory.create(); + +new
[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functionality
[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126186#comment-17126186 ] Kenneth Knowles commented on BEAM-9198: --- Sounds good. By the way, that message was automated. I messed up the automation so it looked like it was from me. If it happens again, it will be clear that it is a bot. Carry on! > BeamSQL aggregation analytics functionality > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: P2 > Labels: gsoc, gsoc2020, mentor > Time Spent: 50m > Remaining Estimate: 0h > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > - > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF
[ https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=441511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441511 ] ASF GitHub Bot logged work on BEAM-9363: Author: ASF GitHub Bot Created on: 04/Jun/20 20:54 Start Date: 04/Jun/20 20:54 Worklog Time Spent: 10m Work Description: apilloud commented on a change in pull request #11868: URL: https://github.com/apache/beam/pull/11868#discussion_r435497089 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TVFSlidingWindowFn.java ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl; + +import java.util.Arrays; +import java.util.Collection; +import java.util.Objects; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.extensions.sql.impl.utils.TVFStreamingUtils; +import org.apache.beam.sdk.transforms.windowing.IntervalWindow; +import org.apache.beam.sdk.transforms.windowing.NonMergingWindowFn; +import org.apache.beam.sdk.transforms.windowing.WindowFn; +import org.apache.beam.sdk.transforms.windowing.WindowFn.AssignContext; +import org.apache.beam.sdk.transforms.windowing.WindowMappingFn; +import org.apache.beam.sdk.values.Row; +import org.joda.time.Duration; + +/** + * TVFSlidingWindowFn assigns window based on input row's "window_start" and "window_end" + * timestamps. + */ +public class TVFSlidingWindowFn extends NonMergingWindowFn { + /** Amount of time between generated windows. */ + private final Duration period; + + /** Size of the generated windows. */ + private final Duration size; + + public static TVFSlidingWindowFn of(Duration size, Duration period) { +return new TVFSlidingWindowFn(size, period); + } + + private TVFSlidingWindowFn(Duration size, Duration period) { +this.period = period; +this.size = size; + } + + @Override + public Collection assignWindows(AssignContext c) throws Exception { +Row curRow = (Row) c.element(); +// In sliding window as TVF syntax, each row contains's its window's start and end as metadata, +// thus we can assign a window directly based on window's start and end metadata. +return Arrays.asList( +new IntervalWindow( +curRow.getDateTime(TVFStreamingUtils.WINDOW_START).toInstant(), +curRow.getDateTime(TVFStreamingUtils.WINDOW_END).toInstant())); + } + + @Override + public boolean isCompatible(WindowFn other) { +return equals(other); + } + + @Override + public Coder windowCoder() { +return IntervalWindow.getCoder(); + } + + @Override + public WindowMappingFn getDefaultWindowMappingFn() { +throw new UnsupportedOperationException( +"TVFSlidingWindow does not support side input windows."); + } + + @Override Review comment: nit: From here to the end of the file is boilerplate that `AutoValue` does for you. Could you use that instead? ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java ## @@ -85,6 +97,85 @@ public TableFunctionScan copy( } private class Transform extends PTransform, PCollection> { +private TVFToPTransform tumbleToPTransform = +(call, upstream) -> { + RexInputRef wmCol = (RexInputRef) call.getOperands().get(1); + Schema outputSchema = CalciteUtils.toSchema(getRowType()); + FixedWindows windowFn = FixedWindows.of(durationParameter(call.getOperands().get(2))); + PCollection streamWithWindowMetadata = + upstream + .apply(ParDo.of(new FixedWindowDoFn(windowFn, wmCol.getIndex(), outputSchema))) + .setRowSchema(outputSchema); + + PCollection windowedStream = + assignTimestampsAndWindow( + streamWithWindowMetadata, wmCol.getIndex(), (WindowFn) windowFn); + + return windowedStream; +}; + +private TVFToPTransform hopToPTransform = +(call, upstream) -> { + RexInputRef wmCol = (RexInputRef)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=441510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441510 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 04/Jun/20 20:49 Start Date: 04/Jun/20 20:49 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-639108631 R: @Ardagan @amaliujia This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441510) Time Spent: 4h 40m (was: 4.5h) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10192) Duplicate PubSub subscriptions with python direct runner in Jupyter/Colab environment
[ https://issues.apache.org/jira/browse/BEAM-10192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126174#comment-17126174 ] Harrison commented on BEAM-10192: - This may be related to BEAM-2988 > Duplicate PubSub subscriptions with python direct runner in Jupyter/Colab > environment > - > > Key: BEAM-10192 > URL: https://issues.apache.org/jira/browse/BEAM-10192 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Affects Versions: 2.22.0 > Environment: Ubuntu 18 (Colab notebook), Python SDK >Reporter: Harrison >Priority: P2 > > When running a streaming pipeline on Colab with direct runner, ReadFromPubSub > can retain old subscriptions and cause message duplication. For example, > manually killing a cell that is running a streaming pubsub pipeline does not > delete the pubsub subscription. If the cell is rerun, the ReadFromPubSub > component will actually be subscribed twice which results in duplicate > messages. > Manually deleting old subscriptions (e.g. via the GCP dashboard) temporarily > fixes the problem. > This Colab notebook: > [https://gist.github.com/hgarrereyn/64ce87cbbcbe9c34ccdd13eafe49e3fb] > contains a runnable example of the bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=441508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441508 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 04/Jun/20 20:27 Start Date: 04/Jun/20 20:27 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-639098591 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441508) Time Spent: 3h 20m (was: 3h 10m) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Labels: stale-assigned > Time Spent: 3h 20m > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette closed BEAM-10188. Fix Version/s: Not applicable Resolution: Fixed > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Priority: P2 > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-10188: Assignee: (was: Brian Hulette) > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441507 ] ASF GitHub Bot logged work on BEAM-10188: - Author: ASF GitHub Bot Created on: 04/Jun/20 20:21 Start Date: 04/Jun/20 20:21 Worklog Time Spent: 10m Work Description: TheNeuralBit merged pull request #11918: URL: https://github.com/apache/beam/pull/11918 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441507) Time Spent: 40m (was: 0.5h) > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-10188: Assignee: Brian Hulette > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Assignee: Brian Hulette >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10196) [beam_PostCommit_Java_ValidatesRunner_Spark] [TimerTests.testOutputTimestampDefault]
Rui Wang created BEAM-10196: --- Summary: [beam_PostCommit_Java_ValidatesRunner_Spark] [TimerTests.testOutputTimestampDefault] Key: BEAM-10196 URL: https://issues.apache.org/jira/browse/BEAM-10196 Project: Beam Issue Type: Bug Components: test-failures Reporter: Rui Wang * https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7516/testReport/ Initial investigation: Timer is not fully supported by Spark runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functionality
[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126168#comment-17126168 ] John Mora commented on BEAM-9198: - Hi [~kenn] I am still working on this issue. I am on the stage 2. "SQL core to implement physical relational operator" of my GSoC proposal. I have been reporting my progress to my mentor [~amaliujia] . Additionally, I have sent a PR with an experiment in order to receive feedback, a few days ago. Regards, John > BeamSQL aggregation analytics functionality > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: P2 > Labels: gsoc, gsoc2020, mentor > Time Spent: 50m > Remaining Estimate: 0h > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > - > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441505 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 04/Jun/20 20:12 Start Date: 04/Jun/20 20:12 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #11922: URL: https://github.com/apache/beam/pull/11922#discussion_r435521657 ## File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java ## @@ -1593,6 +1601,391 @@ public void testProcessElementForSizedElementAndRestriction() throws Exception { assertEquals(stateData, fakeClient.getData()); } + @Test + public void testProcessElementForWindowedSizedElementAndRestriction() throws Exception { +Pipeline p = Pipeline.create(); +PCollection valuePCollection = p.apply(Create.of("unused")); +PCollectionView singletonSideInputView = valuePCollection.apply(View.asSingleton()); +TestSplittableDoFn doFn = new TestSplittableDoFn(singletonSideInputView); + +valuePCollection +.apply(Window.into(SlidingWindows.of(Duration.standardSeconds(1 +.apply(TEST_TRANSFORM_ID, ParDo.of(doFn).withSideInputs(singletonSideInputView)); + +RunnerApi.Pipeline pProto = +ProtoOverrides.updateTransform( +PTransformTranslation.PAR_DO_TRANSFORM_URN, +PipelineTranslation.toProto(p, SdkComponents.create(p.getOptions()), true), +SplittableParDoExpander.createSizedReplacement()); +String expandedTransformId = +Iterables.find( +pProto.getComponents().getTransformsMap().entrySet(), +entry -> +entry +.getValue() +.getSpec() +.getUrn() +.equals( +PTransformTranslation + .SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN) +&& entry.getValue().getUniqueName().contains(TEST_TRANSFORM_ID)) +.getKey(); +RunnerApi.PTransform pTransform = +pProto.getComponents().getTransformsOrThrow(expandedTransformId); +String inputPCollectionId = + pTransform.getInputsOrThrow(ParDoTranslation.getMainInputName(pTransform)); +RunnerApi.PCollection inputPCollection = +pProto.getComponents().getPcollectionsOrThrow(inputPCollectionId); +RehydratedComponents rehydratedComponents = +RehydratedComponents.forComponents(pProto.getComponents()); +Coder inputCoder = +WindowedValue.getFullCoder( +CoderTranslation.fromProto( + pProto.getComponents().getCodersOrThrow(inputPCollection.getCoderId()), +rehydratedComponents, +TranslationContext.DEFAULT), +(Coder) +CoderTranslation.fromProto( +pProto +.getComponents() +.getCodersOrThrow( +pProto +.getComponents() +.getWindowingStrategiesOrThrow( +inputPCollection.getWindowingStrategyId()) +.getWindowCoderId()), +rehydratedComponents, +TranslationContext.DEFAULT)); +String outputPCollectionId = pTransform.getOutputsOrThrow("output"); + +ImmutableMap stateData = +ImmutableMap.of( + multimapSideInputKey(singletonSideInputView.getTagInternal().getId(), ByteString.EMPTY), +encode("8")); + +FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData); + +List> mainOutputValues = new ArrayList<>(); +MetricsContainerStepMap metricsContainerRegistry = new MetricsContainerStepMap(); +PCollectionConsumerRegistry consumers = +new PCollectionConsumerRegistry( +metricsContainerRegistry, mock(ExecutionStateTracker.class)); +consumers.register( +outputPCollectionId, +TEST_TRANSFORM_ID, +(FnDataReceiver) (FnDataReceiver>) mainOutputValues::add); +PTransformFunctionRegistry startFunctionRegistry = +new PTransformFunctionRegistry( +mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "start"); +PTransformFunctionRegistry finishFunctionRegistry = +new PTransformFunctionRegistry( +mock(MetricsContainerStepMap.class), mock(ExecutionStateTracker.class), "finish"); +List teardownFunctions = new ArrayList<>(); +List progressRequestCallbacks = new ArrayList<>(); +BundleSplitListener.InMemory splitListener = BundleSplitListener.InMemory.create(); + +new
[jira] [Assigned] (BEAM-10195) [beam_PostCommit_Java_ValidatesRunner_Samza] [ org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault]
[ https://issues.apache.org/jira/browse/BEAM-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang reassigned BEAM-10195: --- Assignee: Xinyu Liu > [beam_PostCommit_Java_ValidatesRunner_Samza] [ > org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault] > -- > > Key: BEAM-10195 > URL: https://issues.apache.org/jira/browse/BEAM-10195 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Rui Wang >Assignee: Xinyu Liu >Priority: P0 > Labels: currently-failing > > https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/6698/testReport/ > Initial investigation: > TimerTests is not fully supported by Samza runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=441503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441503 ] ASF GitHub Bot logged work on BEAM-9008: Author: ASF GitHub Bot Created on: 04/Jun/20 20:11 Start Date: 04/Jun/20 20:11 Worklog Time Spent: 10m Work Description: vmarquez commented on a change in pull request #10546: URL: https://github.com/apache/beam/pull/10546#discussion_r434870508 ## File path: sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java ## @@ -1170,4 +898,44 @@ private void waitForFuturesToFinish() throws ExecutionException, InterruptedExce } } } + + /** + * A {@link PTransform} to read data from Apache Cassandra. See {@link CassandraIO} for more + * information on usage and configuration. + */ + @AutoValue + public abstract static class ReadAll extends PTransform>, PCollection> { + +@Nullable +abstract Coder coder(); + +abstract ReadAll.Builder builder(); + +/** Specify the {@link Coder} used to serialize the entity in the {@link PCollection}. */ +public ReadAll withCoder(Coder coder) { Review comment: I think we do, but am not sure. You have to call `setCoder` on the PCollection itself, so we don't have access to the `PCollection>` at a point when we also have access to a single `Read` (they are only supplied in the `ReadFn` which can't call `setCoder` on the PTransform). Is my thinking correct? I could be unaware of a different way to do this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441503) Time Spent: 13.5h (was: 13h 20m) > Add readAll() method to CassandraIO > --- > > Key: BEAM-9008 > URL: https://issues.apache.org/jira/browse/BEAM-9008 > Project: Beam > Issue Type: New Feature > Components: io-java-cassandra >Affects Versions: 2.16.0 >Reporter: vincent marquez >Assignee: vincent marquez >Priority: P3 > Time Spent: 13.5h > Remaining Estimate: 0h > > When querying a large cassandra database, it's often *much* more useful to > programatically generate the queries needed to to be run rather than reading > all partitions and attempting some filtering. > As an example: > {code:java} > public class Event { >@PartitionKey(0) public UUID accountId; >@PartitionKey(1)public String yearMonthDay; >@ClusteringKey public UUID eventId; >//other data... > }{code} > If there is ten years worth of data, you may want to only query one year's > worth. Here each token range would represent one 'token' but all events for > the day. > {code:java} > Set accounts = getRelevantAccounts(); > Set dateRange = generateDateRange("2018-01-01", "2019-01-01"); > PCollection tokens = generateTokens(accounts, dateRange); > {code} > > I propose an additional _readAll()_ PTransform that can take a PCollection > of token ranges and can return a PCollection of what the query would > return. > *Question: How much code should be in common between both methods?* > Currently the read connector already groups all partitions into a List of > Token Ranges, so it would be simple to refactor the current read() based > method to a 'ParDo' based one and have them both share the same function. > Reasons against sharing code between read and readAll > * Not having the read based method return a BoundedSource connector would > mean losing the ability to know the size of the data returned > * Currently the CassandraReader executes all the grouped TokenRange queries > *asynchronously* which is (maybe?) fine when all that's happening is > splitting up all the partition ranges but terrible for executing potentially > millions of queries. > Reasons _for_ sharing code would be simplified code base and that both of > the above issues would most likely have a negligable performance impact. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10195) [beam_PostCommit_Java_ValidatesRunner_Samza] [ org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault]
Rui Wang created BEAM-10195: --- Summary: [beam_PostCommit_Java_ValidatesRunner_Samza] [ org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault] Key: BEAM-10195 URL: https://issues.apache.org/jira/browse/BEAM-10195 Project: Beam Issue Type: Bug Components: test-failures Reporter: Rui Wang https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/6698/testReport/ Initial investigation: TimerTests is not fully supported by Samza runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10087) Flink 1.10 failing [Java 11]
[ https://issues.apache.org/jira/browse/BEAM-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126167#comment-17126167 ] Kenneth Knowles commented on BEAM-10087: Because we have Java11-incompatible deps, the idea is that a user would use Java 11 for their pipeline, but then it could have a dependency on Beam built with Java 8 still. > Flink 1.10 failing [Java 11] > > > Key: BEAM-10087 > URL: https://issues.apache.org/jira/browse/BEAM-10087 > Project: Beam > Issue Type: Sub-task > Components: runner-flink >Reporter: Pawel Pasterz >Priority: P2 > > Gradle task _*:runners:flink:1.10:generatePipelineOptionsTableJava*_ fails > during Java 11 Precommit job > Example stack trace > {code:java} > > Task :runners:flink:1.10:generatePipelineOptionsTableJava FAILED > ... > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.UnsupportedClassVersionError: > org/apache/beam/runners/flink/website/PipelineOptionsTableGenerator has been > compiled by a more recent version of the Java Runtime (class file version > 55.0), this version of the Java Runtime only recognizes class file versions > up to 52.0 > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:756) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495) > :runners:flink:1.10:generatePipelineOptionsTableJava (Thread[Execution worker > for ':' Thread 11,5,main]) completed. Took 1.744 secs. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8543) Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle
[ https://issues.apache.org/jira/browse/BEAM-8543?focusedWorklogId=441501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441501 ] ASF GitHub Bot logged work on BEAM-8543: Author: ASF GitHub Bot Created on: 04/Jun/20 20:08 Start Date: 04/Jun/20 20:08 Worklog Time Spent: 10m Work Description: rehmanmuradali commented on pull request #11924: URL: https://github.com/apache/beam/pull/11924#issuecomment-639088647 R: @reuvenlax Could you please take a look that I am on right track? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441501) Time Spent: 20m (was: 10m) > Dataflow streaming timers are not strictly time ordered when set earlier > mid-bundle > --- > > Key: BEAM-8543 > URL: https://issues.apache.org/jira/browse/BEAM-8543 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.13.0 >Reporter: Jan Lukavský >Assignee: Rehman Murad Ali >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > Let's suppose we have the following situation: > - statful ParDo with two timers - timerA and timerB > - timerA is set for window.maxTimestamp() + 1 > - timerB is set anywhere between timerB.timestamp > - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE > Then the order of timers is as follows (correct): > - timerB > - timerA > But, if timerB sets another timer (say for timerB.timestamp + 1), then the > order of timers will be: > - timerB (timerB.timestamp) > - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE) > - timerB (timerB.timestamp + 1) > Which is not ordered by timestamp. The reason for this is that when the input > watermark update is evaluated, the WatermarkManager,extractFiredTimers() will > produce both timerA and timerB. That would be correct, but when timerB sets > another timer, that breaks this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8543) Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle
[ https://issues.apache.org/jira/browse/BEAM-8543?focusedWorklogId=441499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441499 ] ASF GitHub Bot logged work on BEAM-8543: Author: ASF GitHub Bot Created on: 04/Jun/20 19:59 Start Date: 04/Jun/20 19:59 Worklog Time Spent: 10m Work Description: rehmanmuradali opened a new pull request #11924: URL: https://github.com/apache/beam/pull/11924 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-9710) Got current time instead of timestamp value
[ https://issues.apache.org/jira/browse/BEAM-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9710: -- Labels: zetasql-compliance (was: stale-assigned zetasql-compliance) > Got current time instead of timestamp value > --- > > Key: BEAM-9710 > URL: https://issues.apache.org/jira/browse/BEAM-9710 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Robin Qiu >Priority: P4 > Labels: zetasql-compliance > > one failure in shard 13 > {code} > Expected: ARRAY>[{2014-12-01 00:00:00+00}] > Actual: ARRAY>[{2020-04-06 > 00:20:40.052+00}], > {code} > {code} > [prepare_database] > CREATE TABLE Table1 AS > SELECT timestamp '2014-12-01' as timestamp_val > -- > ARRAY>[{2014-12-01 00:00:00+00}] > == > [name=timestamp_type_2] > SELECT timestamp_val > FROM Table1 > -- > ARRAY>[{2014-12-01 00:00:00+00}] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9709) timezone off by 8 hours
[ https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9709: -- Labels: zetasql-compliance (was: stale-assigned zetasql-compliance) > timezone off by 8 hours > --- > > Key: BEAM-9709 > URL: https://issues.apache.org/jira/browse/BEAM-9709 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Andrew Pilloud >Assignee: Robin Qiu >Priority: P4 > Labels: zetasql-compliance > > two failures in shard 13, one failure in shard 19 > {code} > Expected: ARRAY>[{2014-01-31 00:00:00+00}] > Actual: ARRAY>[{2014-01-31 08:00:00+00}], > {code} > {code} > select timestamp(date '2014-01-31') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441488 ] ASF GitHub Bot logged work on BEAM-10188: - Author: ASF GitHub Bot Created on: 04/Jun/20 19:20 Start Date: 04/Jun/20 19:20 Worklog Time Spent: 10m Work Description: jphalip commented on pull request #11918: URL: https://github.com/apache/beam/pull/11918#issuecomment-639064988 Ok, I've loaded `script.config`. I've also added a confirmation message before submitting the API request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441488) Time Spent: 0.5h (was: 20m) > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Priority: P2 > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441486 ] ASF GitHub Bot logged work on BEAM-10188: - Author: ASF GitHub Bot Created on: 04/Jun/20 19:13 Start Date: 04/Jun/20 19:13 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11918: URL: https://github.com/apache/beam/pull/11918#issuecomment-639062174 > All the variables used should already be set in script.config. Does that work? That's probably fine, but if that's the expectation we need to `source script.config`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441486) Time Spent: 20m (was: 10m) > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param
[ https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441484 ] ASF GitHub Bot logged work on BEAM-10175: - Author: ASF GitHub Bot Created on: 04/Jun/20 19:07 Start Date: 04/Jun/20 19:07 Worklog Time Spent: 10m Work Description: pabloem merged pull request #11893: URL: https://github.com/apache/beam/pull/11893 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441484) Time Spent: 2h 20m (was: 2h 10m) > FhirIO execute bundle uses deprecated auth uri param > > > Key: BEAM-10175 > URL: https://issues.apache.org/jira/browse/BEAM-10175 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.22.0 >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P2 > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param
[ https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441483 ] ASF GitHub Bot logged work on BEAM-10175: - Author: ASF GitHub Bot Created on: 04/Jun/20 19:07 Start Date: 04/Jun/20 19:07 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11893: URL: https://github.com/apache/beam/pull/11893#issuecomment-639059270 Ah great observation! I'll merge. I am not aware of it being flaky otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441483) Time Spent: 2h 10m (was: 2h) > FhirIO execute bundle uses deprecated auth uri param > > > Key: BEAM-10175 > URL: https://issues.apache.org/jira/browse/BEAM-10175 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.22.0 >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P2 > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10188) Automate Github release
[ https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441482 ] ASF GitHub Bot logged work on BEAM-10188: - Author: ASF GitHub Bot Created on: 04/Jun/20 19:06 Start Date: 04/Jun/20 19:06 Worklog Time Spent: 10m Work Description: jphalip commented on pull request #11918: URL: https://github.com/apache/beam/pull/11918#issuecomment-639058954 @ibzib @TheNeuralBit Thank you both for the feedback. I've moved the code to a separate script. All the variables used should already be set in `script.config`. Does that work? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441482) Remaining Estimate: 0h Time Spent: 10m > Automate Github release > --- > > Key: BEAM-10188 > URL: https://issues.apache.org/jira/browse/BEAM-10188 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Kyle Weaver >Priority: P2 > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we push the tag to Github and fill in the release notes in > separate steps. For feeds consuming these updates, it would be better to do > both in the same step using the Github API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param
[ https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441477 ] ASF GitHub Bot logged work on BEAM-10175: - Author: ASF GitHub Bot Created on: 04/Jun/20 19:00 Start Date: 04/Jun/20 19:00 Worklog Time Spent: 10m Work Description: jaketf edited a comment on pull request #11893: URL: https://github.com/apache/beam/pull/11893#issuecomment-639055068 Hmmm my latest commit was just `spotlessApply` and this entire PR's change (how we pass the auth token for HCAPI requests for FHIR) seems unrelated to the GCS object moving logic in FHIR import. It does seem like a flake. Note that only the R4 variant failed of a parameterized test but passed for other two FHIR verisons (DSTU2, STU3). The failure seems to be due to trying to move a gcs object that no longer exists (because another thread cleaned it up?) as evidenced by 404 on the source object in the stacktrace (this logic is independent of FHIR version). The relevant logic where this error is raised is in `Filesystems::copy` in [FhirIO.ImportFn::importBatch](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1046). @pabloem Have we seen other instances of this test being flaky? [This dashboard](https://builds.apache.org/job/beam_PostCommit_Java_PR/394/testReport/junit/org.apache.beam.sdk.io.gcp.healthcare/FhirIOWriteIT/testFhirIO_Import_R4_/history/) implies this was the only failure but it is not a long history. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441477) Time Spent: 2h (was: 1h 50m) > FhirIO execute bundle uses deprecated auth uri param > > > Key: BEAM-10175 > URL: https://issues.apache.org/jira/browse/BEAM-10175 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.22.0 >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P2 > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param
[ https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441476 ] ASF GitHub Bot logged work on BEAM-10175: - Author: ASF GitHub Bot Created on: 04/Jun/20 18:58 Start Date: 04/Jun/20 18:58 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11893: URL: https://github.com/apache/beam/pull/11893#issuecomment-639055068 Hmmm my latest commit was just `spotlessApply` and this entire PR's change (how we pass the auth token for HCAPI requests for FHIR) seems unrelated to the GCS object moving logic in FHIR import. It does seem like a flake. Note that only the R4 variant failed of a parameterized test but passed for other two FHIR verisons (DSTU2, STU3). The failure seems to be due to trying to move a gcs object that no longer exists (because another thread cleaned it up?) as evidenced by 404 on the source object in the stacktrace (this logic is independent of FHIR version). The relevant logic where this error is raised is in `Filesystems::copy` in [FhirIO.ImportFn::importBatch](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1046). @pabloem Have we seen other instances of this test being flaky? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441476) Time Spent: 1h 50m (was: 1h 40m) > FhirIO execute bundle uses deprecated auth uri param > > > Key: BEAM-10175 > URL: https://issues.apache.org/jira/browse/BEAM-10175 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.22.0 >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: P2 > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format
[ https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=441474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441474 ] ASF GitHub Bot logged work on BEAM-10176: - Author: ASF GitHub Bot Created on: 04/Jun/20 18:47 Start Date: 04/Jun/20 18:47 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #11923: URL: https://github.com/apache/beam/pull/11923#issuecomment-639047436 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441474) Remaining Estimate: 2h 40m (was: 2h 50m) Time Spent: 20m (was: 10m) > Support BigQuery type aliases in WriteToBigQuery with Avro format > - > > Key: BEAM-10176 > URL: https://issues.apache.org/jira/browse/BEAM-10176 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.23.0 > > Original Estimate: 3h > Time Spent: 20m > Remaining Estimate: 2h 40m > > Support BigQuery type aliases in WriteToBigQuery with avro temp file format. > The following aliases are missing: > * {{STRUCT}} ({{==RECORD}}) > * {{FLOAT64}} ({{==FLOAT}}) > * {{INT64}} ({{==INTEGER}}) > Reference: > https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types > https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableFieldSchema.html#setType-java.lang.String- -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format
[ https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=441473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441473 ] ASF GitHub Bot logged work on BEAM-10176: - Author: ASF GitHub Bot Created on: 04/Jun/20 18:46 Start Date: 04/Jun/20 18:46 Worklog Time Spent: 10m Work Description: chunyang opened a new pull request #11923: URL: https://github.com/apache/beam/pull/11923 Support STRUCT, FLOAT64, INT64 BigQuery types in BIgQuery to Avro schema conversion. We already support RECORD, FLOAT, and INTEGER, which are aliases of the three types being added. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] ~Update `CHANGES.md` with noteworthy changes.~ - [ ] ~If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).~ See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client
[ https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441468 ] ASF GitHub Bot logged work on BEAM-8693: Author: ASF GitHub Bot Created on: 04/Jun/20 18:30 Start Date: 04/Jun/20 18:30 Worklog Time Spent: 10m Work Description: apilloud merged pull request #11836: URL: https://github.com/apache/beam/pull/11836 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441468) Time Spent: 5h (was: 4h 50m) > Beam Dependency Update Request: > com.google.cloud.datastore:datastore-v1-proto-client > > > Key: BEAM-8693 > URL: https://issues.apache.org/jira/browse/BEAM-8693 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: P2 > Fix For: Not applicable > > Time Spent: 5h > Remaining Estimate: 0h > > - 2019-11-15 19:39:56.526732 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:51.468284 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:37.877225 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:10:45.889899 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client
[ https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441465 ] ASF GitHub Bot logged work on BEAM-8693: Author: ASF GitHub Bot Created on: 04/Jun/20 18:21 Start Date: 04/Jun/20 18:21 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #11836: URL: https://github.com/apache/beam/pull/11836#issuecomment-639023595 I get `No new linkage errors`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441465) Time Spent: 4h 50m (was: 4h 40m) > Beam Dependency Update Request: > com.google.cloud.datastore:datastore-v1-proto-client > > > Key: BEAM-8693 > URL: https://issues.apache.org/jira/browse/BEAM-8693 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: P2 > Fix For: Not applicable > > Time Spent: 4h 50m > Remaining Estimate: 0h > > - 2019-11-15 19:39:56.526732 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:51.468284 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:37.877225 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:10:45.889899 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441463 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 04/Jun/20 18:15 Start Date: 04/Jun/20 18:15 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11922: URL: https://github.com/apache/beam/pull/11922#issuecomment-639020928 R: @boyuanzz @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441463) Time Spent: 36h 10m (was: 36h) > Fn API SDF support > -- > > Key: BEAM-2939 > URL: https://issues.apache.org/jira/browse/BEAM-2939 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: P2 > Labels: portability, stale-assigned > Time Spent: 36h 10m > Remaining Estimate: 0h > > The Fn API should support streaming SDF. Detailed design TBD. > Once design is ready, expand subtasks similarly to BEAM-2822. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2939) Fn API SDF support
[ https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441461 ] ASF GitHub Bot logged work on BEAM-2939: Author: ASF GitHub Bot Created on: 04/Jun/20 18:15 Start Date: 04/Jun/20 18:15 Worklog Time Spent: 10m Work Description: lukecwik opened a new pull request #11922: URL: https://github.com/apache/beam/pull/11922 This fixes a bug where we would output within all the windows instead of just the current window. This would not impact any SDF that used only a single window while processing. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.
[ https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=441457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441457 ] ASF GitHub Bot logged work on BEAM-10036: - Author: ASF GitHub Bot Created on: 04/Jun/20 18:09 Start Date: 04/Jun/20 18:09 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on a change in pull request #11766: URL: https://github.com/apache/beam/pull/11766#discussion_r435426414 ## File path: sdks/python/apache_beam/dataframe/partitionings.py ## @@ -0,0 +1,136 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import absolute_import + +from typing import Any +from typing import Iterable +from typing import TypeVar + +import pandas as pd + +Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame) + + +class Partitioning(object): + """A class representing a (consistent) partitioning of dataframe objects. + """ + def is_subpartitioning_of(self, other): +# type: (Partitioning) -> bool + +"""Returns whether self is a sub-partition of other. + +Specifically, returns whether something partitioned by self is necissarily +also partitioned by other. +""" +raise NotImplementedError + + def partition_fn(self, df): +# type: (Frame) -> Iterable[Tuple[Any, Frame]] + +"""A callable that actually performs the partitioning of a Frame df. + +This will be invoked via a FlatMap in conjunction with a GroupKey to +achieve the desired partitioning. +""" +raise NotImplementedError + + +class Index(Partitioning): + """A partitioning by index (either fully or partially). + + If the set of "levels" of the index to consider is not specified, the entire + index is used. + + These form a partial order, given by + + Nothing() < Index([i]) < Index([i, j]) < ... < Index() < Singleton() + + The ordering is implemented via the is_subpartitioning_of method, where the + examples on the right are subpartitionings of the examples on the left above. + """ + + _INDEX_PARTITIONS = 10 + + def __init__(self, levels=None): +self._levels = levels + + def __eq__(self, other): +return type(self) == type(other) and self._levels == other._levels + + def __hash__(self): +if self._levels: + return hash(tuple(sorted(self._levels))) +else: + return hash(type(self)) + + def is_subpartitioning_of(self, other): +if isinstance(other, Nothing): + return True +elif isinstance(other, Index): + if self._levels is None: +return True + elif other._levels is None: +return False + else: +return all(level in other._levels for level in self._levels) +else: + return False + + def partition_fn(self, df): +if self._levels is None: + levels = list(range(df.index.nlevels)) +else: + levels = self._levels +hashes = sum( +pd.util.hash_array(df.index.get_level_values(level)) +for level in levels) +for key in range(self._INDEX_PARTITIONS): + yield key, df[hashes % self._INDEX_PARTITIONS == key] + + +class Singleton(Partitioning): + """A partitioning co-locating all data to a singleton partition. + """ + def __eq__(self, other): +return type(self) == type(other) + + def __hash__(self): +return hash(type(self)) + + def is_subpartitioning_of(self, other): +return True + + def partition_fn(self, df): +yield None, df + + +class Nothing(Partitioning): + """A partitioning imposing no constraints on the actual partitioning. + """ + def __eq__(self, other): +return type(self) == type(other) + + def __hash__(self): +return hash(type(self)) + + def is_subpartitioning_of(self, other): +return not other + + def __bool__(self): +return False + + __nonzero__ = __bool__ Review comment: I think that making Nothing falsy and relying on that in logic elsewhere harms readability. What do you think about dropping this and just explicitly checking for Nothing when needed? ## File path: sdks/python/apache_beam/dataframe/transforms.py ## @@ -138,36 +140,35 @@ def evaluate(partition,
[jira] [Resolved] (BEAM-8153) PubSubIntegrationTest failing in post-commit
[ https://issues.apache.org/jira/browse/BEAM-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-8153. - Fix Version/s: Not applicable Resolution: Fixed > PubSubIntegrationTest failing in post-commit > > > Key: BEAM-8153 > URL: https://issues.apache.org/jira/browse/BEAM-8153 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Udi Meiri >Assignee: Matthew Darwin >Priority: P2 > Labels: stale-assigned > Fix For: Not applicable > > > Most likely due to: https://github.com/apache/beam/pull/9232 > {code} > 11:44:31 > == > 11:44:31 ERROR: test_streaming_with_attributes > (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest) > 11:44:31 > -- > 11:44:31 Traceback (most recent call last): > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", > line 199, in test_streaming_with_attributes > 11:44:31 self._test_streaming(with_attributes=True) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py", > line 191, in _test_streaming > 11:44:31 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py", > line 91, in run_pipeline > 11:44:31 result = p.run() > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py", > line 420, in run > 11:44:31 return self.runner.run_pipeline(self, self._options) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 11:44:31 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py", > line 43, in assert_that > 11:44:31 _assert_match(actual=arg1, matcher=arg2, reason=arg3) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py", > line 49, in _assert_match > 11:44:31 if not matcher.matches(actual): > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/core/allof.py", > line 16, in matches > 11:44:31 if not matcher.matches(item): > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/base_matcher.py", > line 28, in matches > 11:44:31 match_result = self._matches(item) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py", > line 91, in _matches > 11:44:31 return Counter(self.messages) == Counter(self.expected_msg) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py", > line 566, in __init__ > 11:44:31 self.update(*args, **kwds) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py", > line 653, in update > 11:44:31 _count_elements(self, iterable) > 11:44:31 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub.py", > line 83, in __hash__ > 11:44:31 self.message_id, self.publish_time.seconds, > 11:44:31 AttributeError: 'NoneType' object has no attribute 'seconds' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10193) Update Jenkins VMs with docker-credential-gcloud
[ https://issues.apache.org/jira/browse/BEAM-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126125#comment-17126125 ] Udi Meiri commented on BEAM-10193: -- cc: [~tysonjh] > Update Jenkins VMs with docker-credential-gcloud > > > Key: BEAM-10193 > URL: https://issues.apache.org/jira/browse/BEAM-10193 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Brian Hulette >Priority: P3 > > See BEAM-7405 (test failure currently resolved with an inelegant workaround) > for motivation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client
[ https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441455 ] ASF GitHub Bot logged work on BEAM-8693: Author: ASF GitHub Bot Created on: 04/Jun/20 17:55 Start Date: 04/Jun/20 17:55 Worklog Time Spent: 10m Work Description: suztomo commented on pull request #11836: URL: https://github.com/apache/beam/pull/11836#issuecomment-639010785 Does `sdks/java/build-tools/beam-linkage-check.sh` ([wiki](https://cwiki.apache.org/confluence/display/BEAM/Dependency+Upgrades)), without arguments, complain anything now? If no, I'm good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441455) Time Spent: 4h 40m (was: 4.5h) > Beam Dependency Update Request: > com.google.cloud.datastore:datastore-v1-proto-client > > > Key: BEAM-8693 > URL: https://issues.apache.org/jira/browse/BEAM-8693 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: P2 > Fix For: Not applicable > > Time Spent: 4h 40m > Remaining Estimate: 0h > > - 2019-11-15 19:39:56.526732 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:51.468284 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:37.877225 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:10:45.889899 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10190) Reduce cost of toString of MetricKey and MetricName
[ https://issues.apache.org/jira/browse/BEAM-10190?focusedWorklogId=441452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441452 ] ASF GitHub Bot logged work on BEAM-10190: - Author: ASF GitHub Bot Created on: 04/Jun/20 17:52 Start Date: 04/Jun/20 17:52 Worklog Time Spent: 10m Work Description: Zhangyx39 commented on pull request #11915: URL: https://github.com/apache/beam/pull/11915#issuecomment-639009096 Thank you, Likasz and Xinyu. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441452) Time Spent: 1h 10m (was: 1h) > Reduce cost of toString of MetricKey and MetricName > --- > > Key: BEAM-10190 > URL: https://issues.apache.org/jira/browse/BEAM-10190 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Yixing Zhang >Assignee: Yixing Zhang >Priority: P2 > Fix For: 2.23.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Samza runner heavily uses MetricKey.toString() and MetricName.toString() to > update Samza metrics. We found that the toString methods have high CPU cost. > And according to this article: > [https://redfin.engineering/java-string-concatenation-which-way-is-best-8f590a7d22a8], > we should use "+" operator instead of String.format for string concatenation > for better performance. > We do see a 10% QPS gain in nexmark queries using Samza runner with the > change of using "+" operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client
[ https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441450 ] ASF GitHub Bot logged work on BEAM-8693: Author: ASF GitHub Bot Created on: 04/Jun/20 17:49 Start Date: 04/Jun/20 17:49 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #11836: URL: https://github.com/apache/beam/pull/11836#issuecomment-639007691 R: @suztomo This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441450) Time Spent: 4.5h (was: 4h 20m) > Beam Dependency Update Request: > com.google.cloud.datastore:datastore-v1-proto-client > > > Key: BEAM-8693 > URL: https://issues.apache.org/jira/browse/BEAM-8693 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: P2 > Fix For: Not applicable > > Time Spent: 4.5h > Remaining Estimate: 0h > > - 2019-11-15 19:39:56.526732 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:51.468284 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:37.877225 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:10:45.889899 > - > Please consider upgrading the dependency > com.google.cloud.datastore:datastore-v1-proto-client. > The current version is 1.6.0. The latest version is 1.6.3 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10190) Reduce cost of toString of MetricKey and MetricName
[ https://issues.apache.org/jira/browse/BEAM-10190?focusedWorklogId=441449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441449 ] ASF GitHub Bot logged work on BEAM-10190: - Author: ASF GitHub Bot Created on: 04/Jun/20 17:47 Start Date: 04/Jun/20 17:47 Worklog Time Spent: 10m Work Description: xinyuiscool commented on pull request #11915: URL: https://github.com/apache/beam/pull/11915#issuecomment-639006539 @lukecwik : thanks for merging it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441449) Time Spent: 1h (was: 50m) > Reduce cost of toString of MetricKey and MetricName > --- > > Key: BEAM-10190 > URL: https://issues.apache.org/jira/browse/BEAM-10190 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Yixing Zhang >Assignee: Yixing Zhang >Priority: P2 > Fix For: 2.23.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Samza runner heavily uses MetricKey.toString() and MetricName.toString() to > update Samza metrics. We found that the toString methods have high CPU cost. > And according to this article: > [https://redfin.engineering/java-string-concatenation-which-way-is-best-8f590a7d22a8], > we should use "+" operator instead of String.format for string concatenation > for better performance. > We do see a 10% QPS gain in nexmark queries using Samza runner with the > change of using "+" operator. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=441428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441428 ] ASF GitHub Bot logged work on BEAM-9679: Author: ASF GitHub Bot Created on: 04/Jun/20 17:21 Start Date: 04/Jun/20 17:21 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11883: URL: https://github.com/apache/beam/pull/11883#issuecomment-638992758 I've incorporated all the helpful comments. I'll wait for Henry's final approval before updating stepik/committing `*-remote.yaml` files. Thank you, both. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441428) Time Spent: 8h 50m (was: 8h 40m) > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 8h 50m > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > > ||Transform||Pull Request||Status|| > |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| > |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| > |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed| > |Combine Simple > Function|[11866|https://github.com/apache/beam/pull/11866]|Closed| > |CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open| > |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed| > |Partition| | | > |Side Input| | | > |Side Output| | | > |Branching| | | > |Composite Transform| | | > |DoFn Additional Parameters| | | -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.
[ https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=441429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441429 ] ASF GitHub Bot logged work on BEAM-10036: - Author: ASF GitHub Bot Created on: 04/Jun/20 17:21 Start Date: 04/Jun/20 17:21 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on a change in pull request #11766: URL: https://github.com/apache/beam/pull/11766#discussion_r435422511 ## File path: sdks/python/apache_beam/dataframe/expressions.py ## @@ -85,16 +87,10 @@ def evaluate_at(self, session): # type: (Session) -> T """Returns the result of self with the bindings given in session.""" raise NotImplementedError(type(self)) - def requires_partition_by_index(self): # type: () -> bool -"""Whether this expression requires its argument(s) to be partitioned -by index.""" -# TODO: It might be necessary to support partitioning by part of the index, -# for some args, which would require returning more than a boolean here. + def requires_partition_by(self): # type: () -> Partitioning raise NotImplementedError(type(self)) - def preserves_partition_by_index(self): # type: () -> bool -"""Whether the result of this expression will be partitioned by index -whenever all of its inputs are partitioned by index.""" + def preserves_partition_by(self): # type: () -> Partitioning Review comment: Ah that makes sense. And I guess the name "preserves" is actually intuitive now that I understand it's setting an upper bound on the output partitioning. I think the complexity is worth it, unless there's a chance those operations will never materialize. Can you just add a docstring indicating that "preserves" sets an upper bound on the output partitioning (or any other language to make sure readers can grok it)? A similar comment about requires would be good too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441429) Time Spent: 1h 20m (was: 1h 10m) > More flexible dataframes partitioning. > -- > > Key: BEAM-10036 > URL: https://issues.apache.org/jira/browse/BEAM-10036 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: P2 > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently we only track a boolean of whether a dataframe is partitioned by > the (full) index. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=441427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441427 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 04/Jun/20 17:20 Start Date: 04/Jun/20 17:20 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-638992107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441427) Time Spent: 3h 10m (was: 3h) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Labels: stale-assigned > Time Spent: 3h 10m > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write
[ https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441418 ] ASF GitHub Bot logged work on BEAM-9742: Author: ASF GitHub Bot Created on: 04/Jun/20 17:04 Start Date: 04/Jun/20 17:04 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #11396: URL: https://github.com/apache/beam/pull/11396#issuecomment-638984333 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441418) Time Spent: 4h 20m (was: 4h 10m) > Add ability to pass FluentBackoff to JdbcIo.Write > - > > Key: BEAM-9742 > URL: https://issues.apache.org/jira/browse/BEAM-9742 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Reporter: Akshay Iyangar >Assignee: Akshay Iyangar >Priority: P3 > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently, the FluentBackoff is hardcoded with `maxRetries` and > `initialBackoff` . > It would be helpful if the client were able to pass these values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write
[ https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441419 ] ASF GitHub Bot logged work on BEAM-9742: Author: ASF GitHub Bot Created on: 04/Jun/20 17:04 Start Date: 04/Jun/20 17:04 Worklog Time Spent: 10m Work Description: aromanenko-dev removed a comment on pull request #11396: URL: https://github.com/apache/beam/pull/11396#issuecomment-638984073 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441419) Time Spent: 4.5h (was: 4h 20m) > Add ability to pass FluentBackoff to JdbcIo.Write > - > > Key: BEAM-9742 > URL: https://issues.apache.org/jira/browse/BEAM-9742 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Reporter: Akshay Iyangar >Assignee: Akshay Iyangar >Priority: P3 > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently, the FluentBackoff is hardcoded with `maxRetries` and > `initialBackoff` . > It would be helpful if the client were able to pass these values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write
[ https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441417 ] ASF GitHub Bot logged work on BEAM-9742: Author: ASF GitHub Bot Created on: 04/Jun/20 17:04 Start Date: 04/Jun/20 17:04 Worklog Time Spent: 10m Work Description: aromanenko-dev removed a comment on pull request #11396: URL: https://github.com/apache/beam/pull/11396#issuecomment-638983908 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441417) Time Spent: 4h 10m (was: 4h) > Add ability to pass FluentBackoff to JdbcIo.Write > - > > Key: BEAM-9742 > URL: https://issues.apache.org/jira/browse/BEAM-9742 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Reporter: Akshay Iyangar >Assignee: Akshay Iyangar >Priority: P3 > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently, the FluentBackoff is hardcoded with `maxRetries` and > `initialBackoff` . > It would be helpful if the client were able to pass these values. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write
[ https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441416 ] ASF GitHub Bot logged work on BEAM-9742: Author: ASF GitHub Bot Created on: 04/Jun/20 17:04 Start Date: 04/Jun/20 17:04 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #11396: URL: https://github.com/apache/beam/pull/11396#issuecomment-638984073 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 441416) Time Spent: 4h (was: 3h 50m) > Add ability to pass FluentBackoff to JdbcIo.Write > - > > Key: BEAM-9742 > URL: https://issues.apache.org/jira/browse/BEAM-9742 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Reporter: Akshay Iyangar >Assignee: Akshay Iyangar >Priority: P3 > Time Spent: 4h > Remaining Estimate: 0h > > Currently, the FluentBackoff is hardcoded with `maxRetries` and > `initialBackoff` . > It would be helpful if the client were able to pass these values. -- This message was sent by Atlassian Jira (v8.3.4#803005)