[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=441676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441676
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 05/Jun/20 04:44
Start Date: 05/Jun/20 04:44
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on pull request #11883:
URL: https://github.com/apache/beam/pull/11883#issuecomment-639255195


   @lostluck and @henryken I updated the [stepik 
course](https://stepik.org/course/70387) and commited the updated 
`*-remote.yaml` files to this PR.  It is ready to merge.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441676)
Time Spent: 9h  (was: 8h 50m)

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
> |Combine Simple 
> Function|[11866|https://github.com/apache/beam/pull/11866]|Closed|
> |CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10201) Enhance JsonToRow to add Deadletter Support

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10201?focusedWorklogId=441675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441675
 ]

ASF GitHub Bot logged work on BEAM-10201:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 04:35
Start Date: 05/Jun/20 04:35
Worklog Time Spent: 10m 
  Work Description: rezarokni opened a new pull request #11929:
URL: https://github.com/apache/beam/pull/11929


   Add deadletter support to JsonToRow. 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=441674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441674
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 05/Jun/20 04:27
Start Date: 05/Jun/20 04:27
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #11868:
URL: https://github.com/apache/beam/pull/11868#issuecomment-639251081


   Thank you Andrew! Will address your comments soon!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441674)
Time Spent: 8h 50m  (was: 8h 40m)

> BeamSQL windowing as TVF
> 
>
> Key: BEAM-9363
> URL: https://issues.apache.org/jira/browse/BEAM-9363
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
>  This Jira tracks the implementation for 
> https://s.apache.org/streaming-beam-sql
> TVF is table-valued function, which is a SQL feature that produce a table as 
> function's output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=441665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441665
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 05/Jun/20 04:09
Start Date: 05/Jun/20 04:09
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #9899:
URL: https://github.com/apache/beam/pull/9899#issuecomment-639246523


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441665)
Time Spent: 5h 10m  (was: 5h)

> Support for enhanced fan-out in KinesisIO.Read
> --
>
> Key: BEAM-8511
> URL: https://issues.apache.org/jira/browse/BEAM-8511
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kinesis
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Add support for reading from an enhanced fan-out consumer using KinesisIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10194) Could not get unknown property 'println'

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10194?focusedWorklogId=441661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441661
 ]

ASF GitHub Bot logged work on BEAM-10194:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 03:32
Start Date: 05/Jun/20 03:32
Worklog Time Spent: 10m 
  Work Description: henryken commented on pull request #11921:
URL: https://github.com/apache/beam/pull/11921#issuecomment-639237442


   @pabloem, can help to merge?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441661)
Time Spent: 40m  (was: 0.5h)

> Could not get unknown property 'println'
> 
>
> Key: BEAM-10194
> URL: https://issues.apache.org/jira/browse/BEAM-10194
> Project: Beam
>  Issue Type: Bug
>  Components: katas
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: P4
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a test fails, in addition to printing the expected error, the console 
> logs an additional (unexpected) error:
> Could not get unknown property 'println' for task 
> ':Core_Transforms-Map-MapElements:test' of type 
> org.gradle.api.tasks.testing.Test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=441656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441656
 ]

ASF GitHub Bot logged work on BEAM-9444:


Author: ASF GitHub Bot
Created on: 05/Jun/20 03:14
Start Date: 05/Jun/20 03:14
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #11586:
URL: https://github.com/apache/beam/pull/11586#issuecomment-639232697


   This idea was a bad idea; a hack on top of another hack. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441656)
Time Spent: 14.5h  (was: 14h 20m)

> Shall we use GCP Libraries BOM to specify Google-related library versions?
> --
>
> Key: BEAM-9444
> URL: https://issues.apache.org/jira/browse/BEAM-9444
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Labels: stale-assigned
> Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot 
> 2020-03-17 at 16.01.16.png
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Shall we use GCP Libraries BOM to specify Google-related library versions?
>   
>  I've been working on Beam's dependency upgrades in the past few months. I 
> think it's time to consider a long-term solution to keep the libraries 
> up-to-date with small maintenance effort. To achieve that, I propose Beam to 
> use GCP Libraries BOM to set the Google-related library versions, rather than 
> trying to make changes in each of ~30 Google libraries.
>   
> h1. Background
> A BOM is pom.xml that provides dependencyManagement to importing projects.
>   
>  GCP Libraries BOM is a BOM that includes many Google Cloud related libraries 
> + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain 
> the BOM so that the set of the libraries are compatible with each other.
>   
> h1. Implementation
> Notes for obstacles.
> h2. BeamModulePlugin's "force" does not take BOM into account (thus fails)
> {{forcedModules}} via version resolution strategy is playing bad. This causes
> {noformat}
> A problem occurred evaluating project ':sdks:java:extensions:sql'. 
> Could not resolve all dependencies for configuration 
> ':sdks:java:extensions:sql:fmppTemplates'.
> Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version 
> cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat}
> !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! 
>   
> h2. :sdks:java:maven-archetypes:examples needs the version of 
> google-http-client
> The task requires the version for the library:
> {code:java}
> 'google-http-client.version': 
> dependencies.create(project.library.java.google_http_client).getVersion(),
> {code}
> This would generate NullPointerException. Running gradlew without the 
> subproject:
>   
> {code:java}
> ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check
> {code}
> h1. Problem in Gradle-generated pom files
> The generated Maven artifact POM has invalid data due to the BOM change. For 
> example my locally installed 
> {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}}
>  had the following problems.
> h2. The GCP Libraries BOM showing up in dependencies section:
> {noformat}
>   
> 
>   com.google.cloud
>   libraries-bom
>   4.2.0
>   compile
>   
> 
>   com.google.guava
>   guava-jdk5
> ...
>   
> 
> {noformat}
> h2. The artifact that use the BOM in Gradle is missing version in the 
> dependency.
> {noformat}
> 
>   com.google.api
>   gax
>   
>   compile
>   ...
> 
> {noformat}
> h1. DependencyManagement section in generated pom.xml
> How can I check whether a entry in dependencies is "platform"?
> !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9444?focusedWorklogId=441657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441657
 ]

ASF GitHub Bot logged work on BEAM-9444:


Author: ASF GitHub Bot
Created on: 05/Jun/20 03:14
Start Date: 05/Jun/20 03:14
Worklog Time Spent: 10m 
  Work Description: suztomo closed pull request #11586:
URL: https://github.com/apache/beam/pull/11586


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441657)
Time Spent: 14h 40m  (was: 14.5h)

> Shall we use GCP Libraries BOM to specify Google-related library versions?
> --
>
> Key: BEAM-9444
> URL: https://issues.apache.org/jira/browse/BEAM-9444
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: P2
>  Labels: stale-assigned
> Attachments: Screen Shot 2020-03-13 at 13.33.01.png, Screen Shot 
> 2020-03-17 at 16.01.16.png
>
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Shall we use GCP Libraries BOM to specify Google-related library versions?
>   
>  I've been working on Beam's dependency upgrades in the past few months. I 
> think it's time to consider a long-term solution to keep the libraries 
> up-to-date with small maintenance effort. To achieve that, I propose Beam to 
> use GCP Libraries BOM to set the Google-related library versions, rather than 
> trying to make changes in each of ~30 Google libraries.
>   
> h1. Background
> A BOM is pom.xml that provides dependencyManagement to importing projects.
>   
>  GCP Libraries BOM is a BOM that includes many Google Cloud related libraries 
> + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain 
> the BOM so that the set of the libraries are compatible with each other.
>   
> h1. Implementation
> Notes for obstacles.
> h2. BeamModulePlugin's "force" does not take BOM into account (thus fails)
> {{forcedModules}} via version resolution strategy is playing bad. This causes
> {noformat}
> A problem occurred evaluating project ':sdks:java:extensions:sql'. 
> Could not resolve all dependencies for configuration 
> ':sdks:java:extensions:sql:fmppTemplates'.
> Invalid format: 'com.google.cloud:google-cloud-core'. Group, name and version 
> cannot be empty. Correct example: 'org.gradle:gradle-core:1.0'{noformat}
> !Screen Shot 2020-03-13 at 13.33.01.png|width=489,height=287! 
>   
> h2. :sdks:java:maven-archetypes:examples needs the version of 
> google-http-client
> The task requires the version for the library:
> {code:java}
> 'google-http-client.version': 
> dependencies.create(project.library.java.google_http_client).getVersion(),
> {code}
> This would generate NullPointerException. Running gradlew without the 
> subproject:
>   
> {code:java}
> ./gradlew -p sdks/java check -x :sdks:java:maven-archetypes:examples:check
> {code}
> h1. Problem in Gradle-generated pom files
> The generated Maven artifact POM has invalid data due to the BOM change. For 
> example my locally installed 
> {{~/.m2/repository/org/apache/beam/beam-sdks-java-io-google-cloud-platform/2.21.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.21.0-SNAPSHOT.pom}}
>  had the following problems.
> h2. The GCP Libraries BOM showing up in dependencies section:
> {noformat}
>   
> 
>   com.google.cloud
>   libraries-bom
>   4.2.0
>   compile
>   
> 
>   com.google.guava
>   guava-jdk5
> ...
>   
> 
> {noformat}
> h2. The artifact that use the BOM in Gradle is missing version in the 
> dependency.
> {noformat}
> 
>   com.google.api
>   gax
>   
>   compile
>   ...
> 
> {noformat}
> h1. DependencyManagement section in generated pom.xml
> How can I check whether a entry in dependencies is "platform"?
> !Screen Shot 2020-03-17 at 16.01.16.png|width=504,height=344!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10112) Add python sdk state and timer examples to website

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10112?focusedWorklogId=441654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441654
 ]

ASF GitHub Bot logged work on BEAM-10112:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 03:10
Start Date: 05/Jun/20 03:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11882:
URL: https://github.com/apache/beam/pull/11882#issuecomment-639231673


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441654)
Time Spent: 0.5h  (was: 20m)

> Add python sdk state and timer examples to website
> --
>
> Key: BEAM-10112
> URL: https://issues.apache.org/jira/browse/BEAM-10112
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=441651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441651
 ]

ASF GitHub Bot logged work on BEAM-8451:


Author: ASF GitHub Bot
Created on: 05/Jun/20 03:07
Start Date: 05/Jun/20 03:07
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11706:
URL: https://github.com/apache/beam/pull/11706#issuecomment-639230979


   @rosetn - did you have a chance to review this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441651)
Time Spent: 2.5h  (was: 2h 20m)

> Interactive Beam example failing from stack overflow
> 
>
> Key: BEAM-8451
> URL: https://issues.apache.org/jira/browse/BEAM-8451
> Project: Beam
>  Issue Type: Bug
>  Components: examples-python, runner-py-interactive
>Reporter: Igor Durovic
>Assignee: Chun Yang
>Priority: P2
> Fix For: 2.18.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
>  
> RecursionError: maximum recursion depth exceeded in __instancecheck__
> at 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405]
>  
> This occurred after the execution of the last cell in 
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10201) Enhance JsonToRow to add Deadletter Support

2020-06-04 Thread Reza ardeshir rokni (Jira)
Reza ardeshir rokni created BEAM-10201:
--

 Summary: Enhance JsonToRow to add Deadletter Support
 Key: BEAM-10201
 URL: https://issues.apache.org/jira/browse/BEAM-10201
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Reporter: Reza ardeshir rokni


Current JsonToRow transform does not support Dead Letter pattern on parse 
failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects

2020-06-04 Thread Reza ardeshir rokni (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza ardeshir rokni updated BEAM-10199:
---
Priority: P3  (was: P2)

> AutoValue Schema unable to work with auto-value-gson-annotations objects
> 
>
> Key: BEAM-10199
> URL: https://issues.apache.org/jira/browse/BEAM-10199
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.21.0
>Reporter: Reza ardeshir rokni
>Priority: P3
>
> Objects created with the extension :
> com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0
> Fail with NPE in Java pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects

2020-06-04 Thread Reza ardeshir rokni (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza ardeshir rokni updated BEAM-10199:
---
Issue Type: Bug  (was: New Feature)

> AutoValue Schema unable to work with auto-value-gson-annotations objects
> 
>
> Key: BEAM-10199
> URL: https://issues.apache.org/jira/browse/BEAM-10199
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.21.0
>Reporter: Reza ardeshir rokni
>Priority: P2
>
> Objects created with the extension :
> com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0
> Fail with NPE in Java pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10200) Improve memory profiling for users of Portable Beam Python

2020-06-04 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-10200:
---
Labels: starter  (was: )

> Improve memory profiling for users of Portable Beam Python
> --
>
> Key: BEAM-10200
> URL: https://issues.apache.org/jira/browse/BEAM-10200
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Priority: P2
>  Labels: starter
>
> We have a Profiler[1] that is integrated with SDK worker[1a], however it only 
> saves CPU metrics [1b].
> We have a MemoryReporter util[2] which can log heap dumps, however it is not 
> documented on Beam Website and does not respect the --profile_memory and 
> --profile_location options[3]. The profile_memory flag currently works only 
> for  Dataflow Runner users who run non-portable batch pipelines;  profiles 
> are saved only if memory usage between samples exceeds 1000G. 
> We should improve memory profiling experience for Portable Python users and 
> consider making a guide on how users can investigate OOMing pipelines on Beam 
> website.
>  
> [1] 
> https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46
> [1a] 
> https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157
> [1b] 
> https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112
> [2] 
> https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124
> [3] 
> https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10200) Improve memory profiling for users of Portable Beam Python

2020-06-04 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-10200:
--

 Summary: Improve memory profiling for users of Portable Beam Python
 Key: BEAM-10200
 URL: https://issues.apache.org/jira/browse/BEAM-10200
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Valentyn Tymofieiev


We have a Profiler[1] that is integrated with SDK worker[1a], however it only 
saves CPU metrics [1b].
We have a MemoryReporter util[2] which can log heap dumps, however it is not 
documented on Beam Website and does not respect the --profile_memory and 
--profile_location options[3]. The profile_memory flag currently works only for 
 Dataflow Runner users who run non-portable batch pipelines;  profiles are 
saved only if memory usage between samples exceeds 1000G. 

We should improve memory profiling experience for Portable Python users and 
consider making a guide on how users can investigate OOMing pipelines on Beam 
website.
 
[1] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46
[1a] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157
[1b] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112
[2] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124
[3] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441624
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 05/Jun/20 00:59
Start Date: 05/Jun/20 00:59
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11927:
URL: https://github.com/apache/beam/pull/11927#issuecomment-639196604


   Run Go PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441624)
Time Spent: 1h 10m  (was: 1h)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=441623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441623
 ]

ASF GitHub Bot logged work on BEAM-10036:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 00:56
Start Date: 05/Jun/20 00:56
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11766:
URL: https://github.com/apache/beam/pull/11766#discussion_r435632933



##
File path: sdks/python/apache_beam/dataframe/expressions.py
##
@@ -85,16 +87,10 @@ def evaluate_at(self, session):  # type: (Session) -> T
 """Returns the result of self with the bindings given in session."""
 raise NotImplementedError(type(self))
 
-  def requires_partition_by_index(self):  # type: () -> bool
-"""Whether this expression requires its argument(s) to be partitioned
-by index."""
-# TODO: It might be necessary to support partitioning by part of the index,
-# for some args, which would require returning more than a boolean here.
+  def requires_partition_by(self):  # type: () -> Partitioning
 raise NotImplementedError(type(self))
 
-  def preserves_partition_by_index(self):  # type: () -> bool
-"""Whether the result of this expression will be partitioned by index
-whenever all of its inputs are partitioned by index."""
+  def preserves_partition_by(self):  # type: () -> Partitioning

Review comment:
   Docstring comments added. 

##
File path: sdks/python/apache_beam/dataframe/partitionings.py
##
@@ -0,0 +1,136 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from typing import Any
+from typing import Iterable
+from typing import TypeVar
+
+import pandas as pd
+
+Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame)
+
+
+class Partitioning(object):
+  """A class representing a (consistent) partitioning of dataframe objects.
+  """
+  def is_subpartitioning_of(self, other):
+# type: (Partitioning) -> bool
+
+"""Returns whether self is a sub-partition of other.
+
+Specifically, returns whether something partitioned by self is necissarily
+also partitioned by other.
+"""
+raise NotImplementedError
+
+  def partition_fn(self, df):
+# type: (Frame) -> Iterable[Tuple[Any, Frame]]
+
+"""A callable that actually performs the partitioning of a Frame df.
+
+This will be invoked via a FlatMap in conjunction with a GroupKey to
+achieve the desired partitioning.
+"""
+raise NotImplementedError
+
+
+class Index(Partitioning):
+  """A partitioning by index (either fully or partially).
+
+  If the set of "levels" of the index to consider is not specified, the entire
+  index is used.
+
+  These form a partial order, given by
+
+  Nothing() < Index([i]) < Index([i, j]) < ... < Index() < Singleton()
+
+  The ordering is implemented via the is_subpartitioning_of method, where the
+  examples on the right are subpartitionings of the examples on the left above.
+  """
+
+  _INDEX_PARTITIONS = 10
+
+  def __init__(self, levels=None):
+self._levels = levels
+
+  def __eq__(self, other):
+return type(self) == type(other) and self._levels == other._levels
+
+  def __hash__(self):
+if self._levels:
+  return hash(tuple(sorted(self._levels)))
+else:
+  return hash(type(self))
+
+  def is_subpartitioning_of(self, other):
+if isinstance(other, Nothing):
+  return True
+elif isinstance(other, Index):
+  if self._levels is None:
+return True
+  elif other._levels is None:
+return False
+  else:
+return all(level in other._levels for level in self._levels)
+else:
+  return False
+
+  def partition_fn(self, df):
+if self._levels is None:
+  levels = list(range(df.index.nlevels))
+else:
+  levels = self._levels
+hashes = sum(
+pd.util.hash_array(df.index.get_level_values(level))
+for level in levels)
+for key in range(self._INDEX_PARTITIONS):
+  yield key, df[hashes % self._INDEX_PARTITIONS == key]
+
+
+class Singleton(Partitioning):
+  """A partitioning 

[jira] [Updated] (BEAM-10061) ReadAllFromTextWithFilename

2020-06-04 Thread Jacob Ferriero (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Ferriero updated BEAM-10061:
--
Component/s: (was: io-py-gcp)
 sdk-java-core
 io-py-files
 io-java-files

> ReadAllFromTextWithFilename
> ---
>
> Key: BEAM-10061
> URL: https://issues.apache.org/jira/browse/BEAM-10061
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files, io-py-files, sdk-java-core, sdk-py-core
> Environment: Dataflow with Python
>Reporter: Ryan Canty
>Priority: P2
>
> I am trying to create a job that reads from GCS executes some code against 
> each line and creates a PCollection with the line and the file. So basically 
> what I'd like is a combination of textio.ReadTextWithFilename and 
> textio.ReadAllFromText



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10061) ReadAllFromTextWithFilename

2020-06-04 Thread Jacob Ferriero (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126295#comment-17126295
 ] 

Jacob Ferriero commented on BEAM-10061:
---

This would also be useful in java sdk as that's what I typically see customers 
using. (added java components to this issue)

> ReadAllFromTextWithFilename
> ---
>
> Key: BEAM-10061
> URL: https://issues.apache.org/jira/browse/BEAM-10061
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files, io-py-files, sdk-java-core, sdk-py-core
> Environment: Dataflow with Python
>Reporter: Ryan Canty
>Priority: P2
>
> I am trying to create a job that reads from GCS executes some code against 
> each line and creates a PCollection with the line and the file. So basically 
> what I'd like is a combination of textio.ReadTextWithFilename and 
> textio.ReadAllFromText



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10184) Build python wheels on GitHub Actions for Linux/MacOS

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10184?focusedWorklogId=441616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441616
 ]

ASF GitHub Bot logged work on BEAM-10184:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 00:42
Start Date: 05/Jun/20 00:42
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #11877:
URL: https://github.com/apache/beam/pull/11877#discussion_r435631141



##
File path: .github/workflows/build_wheels.yml
##
@@ -0,0 +1,141 @@
+name: Build python wheels
+
+on:
+  push:
+branches:
+  - master
+  - release-*
+tags:
+  - v*
+
+jobs:
+
+  build_source:
+runs-on: ubuntu-18.04
+steps:
+  - name: Checkout code
+uses: actions/checkout@v2
+  - name: Install python
+uses: actions/setup-python@v2
+with:
+  python-version: 3.7
+  - name: Get build dependencies
+working-directory: ./sdks/python
+run: python3 -m pip install cython && python3 -m pip install -r 
build-requirements.txt
+  - name: Install wheels
+run: python3 -m pip install wheel
+  - name: Buld source
+working-directory: ./sdks/python
+run: python3 setup.py sdist --formats=gztar,zip
+  - name: Unzip source
+working-directory: ./sdks/python
+run: unzip dist/$(ls dist | grep .zip | head -n 1)
+  - name: Rename source directory
+working-directory: ./sdks/python
+run: mv $(ls | grep apache-beam) apache-beam-source
+  - name: Upload source
+uses: actions/upload-artifact@v2
+with:
+  name: source
+  path: sdks/python/apache-beam-source
+  - name: Upload compressed sources
+uses: actions/upload-artifact@v2
+with:
+  name: source_gztar_zip
+  path: sdks/python/dist
+
+  prepare_gcs:
+name: Prepare GCS
+needs: build_source
+runs-on: ubuntu-18.04
+steps:
+  - name: Authenticate on GCP
+uses: GoogleCloudPlatform/github-actions/setup-gcloud@master
+with:
+  service_account_email: ${{ secrets.CCP_SA_EMAIL }}
+  service_account_key: ${{ secrets.CCP_SA_KEY }}
+  - name: Remove existing files on GCS bucket
+run: gsutil rm -r "gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/" 
|| true
+
+  upload_source_to_gcs:
+name: Upload source to GCS bucket
+needs: prepare_gcs
+runs-on: ubuntu-18.04
+steps:
+  - name: Download wheels
+uses: actions/download-artifact@v2
+with:
+  name: source_gztar_zip
+  path: source/
+  - name: Authenticate on GCP
+uses: GoogleCloudPlatform/github-actions/setup-gcloud@master
+with:
+  service_account_email: ${{ secrets.CCP_SA_EMAIL }}
+  service_account_key: ${{ secrets.CCP_SA_KEY }}
+  - name: Copy sources to GCS bucket
+run: gsutil cp -r -a public-read source/* gs://${{ secrets.CCP_BUCKET 
}}/${GITHUB_REF##*/}/
+  - name: List sources on GCS bucket
+run: |
+  gsutil ls "gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/*.tar.gz"
+  gsutil ls "gs://${{ secrets.CCP_BUCKET }}/${GITHUB_REF##*/}/*.zip"
+
+  build_wheels:
+name: Build wheels on ${{ matrix.os }}
+needs: prepare_gcs
+runs-on: ${{ matrix.os }}
+strategy:
+  matrix:
+os : [ubuntu-18.04, macos-10.15]

Review comment:
   Perfect. (just to clarify my understanding: we will have a single github 
ci config, this is correct, right?)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441616)
Time Spent: 3h 20m  (was: 3h 10m)

> Build python wheels on GitHub Actions for Linux/MacOS
> -
>
> Key: BEAM-10184
> URL: https://issues.apache.org/jira/browse/BEAM-10184
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Tobiasz Kedzierski
>Priority: P2
>  Labels: build, python, python-packages, python-wheel
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10184) Build python wheels on GitHub Actions for Linux/MacOS

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10184?focusedWorklogId=441615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441615
 ]

ASF GitHub Bot logged work on BEAM-10184:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 00:40
Start Date: 05/Jun/20 00:40
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #11877:
URL: https://github.com/apache/beam/pull/11877#discussion_r435630823



##
File path: .github/workflows/build_wheels.yml
##
@@ -0,0 +1,141 @@
+name: Build python wheels
+
+on:
+  push:
+branches:
+  - master
+  - release-*
+tags:
+  - v*
+
+jobs:
+
+  build_source:
+runs-on: ubuntu-18.04
+steps:
+  - name: Checkout code
+uses: actions/checkout@v2
+  - name: Install python
+uses: actions/setup-python@v2
+with:
+  python-version: 3.7
+  - name: Get build dependencies
+working-directory: ./sdks/python
+run: python3 -m pip install cython && python3 -m pip install -r 
build-requirements.txt
+  - name: Install wheels
+run: python3 -m pip install wheel
+  - name: Buld source
+working-directory: ./sdks/python
+run: python3 setup.py sdist --formats=gztar,zip
+  - name: Unzip source
+working-directory: ./sdks/python
+run: unzip dist/$(ls dist | grep .zip | head -n 1)
+  - name: Rename source directory
+working-directory: ./sdks/python
+run: mv $(ls | grep apache-beam) apache-beam-source
+  - name: Upload source
+uses: actions/upload-artifact@v2
+with:
+  name: source
+  path: sdks/python/apache-beam-source
+  - name: Upload compressed sources
+uses: actions/upload-artifact@v2
+with:
+  name: source_gztar_zip

Review comment:
   Very nice.
   
   2 follow up questions:
   
   1. Can we add a very last stage that can run: gsutil ls 
"gs://***/$***GITHUB_REF##*/***/*" -> so that we can get the whole output of 
the gcs folder all at once.
   
   (btw, do we have a mechanism to clean up these gcs locations?)
   
   2. What is the difference between "Build python wheels / Build wheels on 
..." job executing "Upload wheels" step and "Upload wheels to GCS bucket ..." 
job? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441615)
Time Spent: 3h 10m  (was: 3h)

> Build python wheels on GitHub Actions for Linux/MacOS
> -
>
> Key: BEAM-10184
> URL: https://issues.apache.org/jira/browse/BEAM-10184
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Tobiasz Kedzierski
>Priority: P2
>  Labels: build, python, python-packages, python-wheel
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3788?focusedWorklogId=441613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441613
 ]

ASF GitHub Bot logged work on BEAM-3788:


Author: ASF GitHub Bot
Created on: 05/Jun/20 00:35
Start Date: 05/Jun/20 00:35
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11928:
URL: https://github.com/apache/beam/pull/11928#issuecomment-639190896


   R: @robertwb 
   
   CC: @mxm 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441613)
Time Spent: 20m  (was: 10m)

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: P2
> Fix For: 2.22.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Java KafkaIO will be made available to Python users as a cross-language 
> transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10184) Build python wheels on GitHub Actions for Linux/MacOS

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10184?focusedWorklogId=441610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441610
 ]

ASF GitHub Bot logged work on BEAM-10184:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 00:34
Start Date: 05/Jun/20 00:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11877:
URL: https://github.com/apache/beam/pull/11877#issuecomment-639190586


   > > How can I preview the action on the fork: 
[TobKed#3](https://github.com/TobKed/beam/pull/3) ?
   > 
   > @aaltay 
https://github.com/TobKed/beam/actions?query=branch%3Agithub-actions-build-wheels-linux-macos
   
   Super nice! I have not reviewed the steps, I think PR is better for doing 
the most initial review. We can do a final quick review of this once we settle 
on this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441610)
Time Spent: 3h  (was: 2h 50m)

> Build python wheels on GitHub Actions for Linux/MacOS
> -
>
> Key: BEAM-10184
> URL: https://issues.apache.org/jira/browse/BEAM-10184
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Tobiasz Kedzierski
>Priority: P2
>  Labels: build, python, python-packages, python-wheel
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3788?focusedWorklogId=441609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441609
 ]

ASF GitHub Bot logged work on BEAM-3788:


Author: ASF GitHub Bot
Created on: 05/Jun/20 00:33
Start Date: 05/Jun/20 00:33
Worklog Time Spent: 10m 
  Work Description: chamikaramj opened a new pull request #11928:
URL: https://github.com/apache/beam/pull/11928


   Also removes it from in-progress IO connectors list now that it's supported 
by portable runners as well as Dataflow.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441599
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 05/Jun/20 00:10
Start Date: 05/Jun/20 00:10
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11927:
URL: https://github.com/apache/beam/pull/11927#issuecomment-639183557


   R: @tysonjh 
   cc: @youngoli 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441599)
Time Spent: 1h  (was: 50m)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441598
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 05/Jun/20 00:09
Start Date: 05/Jun/20 00:09
Worklog Time Spent: 10m 
  Work Description: lostluck opened a new pull request #11927:
URL: https://github.com/apache/beam/pull/11927


   I noticed that there were several stranglers that were not following the 
import name conventions, and this PR completes that work.
   
   pipepb, fnpb, jobpb for the Pipeline protos, Fn Execution protos, and Job 
Management protos respectively, and v1pb for the internal serialization 
representation for types the Go SDK uses.
   
   This PR is a rote find/replace.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441597
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 05/Jun/20 00:07
Start Date: 05/Jun/20 00:07
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#11820:
URL: https://github.com/apache/beam/pull/11820#discussion_r435622797



##
File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.nexmark.queries.zetasql;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.sql.SqlTransform;
+import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.nexmark.model.Bid;
+import org.apache.beam.sdk.nexmark.model.Event;
+import org.apache.beam.sdk.nexmark.model.Event.Type;
+import org.apache.beam.sdk.nexmark.model.sql.SelectEvent;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil;
+import org.apache.beam.sdk.schemas.transforms.Convert;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Filter;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Query 0: Pass events through unchanged.
+ *
+ * This measures the overhead of the Beam ZetaSql implementation and test 
harness like conversion
+ * from Java model classes to Beam records.
+ *
+ * {@link Bid} events are used here at the moment, ås they are most 
numerous with default
+ * configuration.
+ */
+public class ZetaSqlQuery0 extends NexmarkQueryTransform {
+
+  public ZetaSqlQuery0() {
+super("ZetaSqlQuery0");
+  }
+
+  @Override
+  public PCollection expand(PCollection allEvents) {
+PCollection rows =
+allEvents
+.apply(Filter.by(NexmarkQueryUtil.IS_BID))
+.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID));
+
+return rows.apply(getName() + ".Serialize", 
logBytesMetric(rows.getCoder()))
+.setRowSchema(rows.getSchema())
+.apply(
+SqlTransform.query("SELECT * FROM PCOLLECTION")
+.withQueryPlannerClass(ZetaSQLQueryPlanner.class))

Review comment:
   OK that was a total misadventure. Going to back to just totally coupling 
the SQL dialect suites.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441597)
Time Spent: 2h  (was: 1h 50m)

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Most queries will be identical, but best to simply stay decoupled, so this is 
> a copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke updated BEAM-9615:
---
Labels:   (was: stale-assigned)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread Robert Burke (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126280#comment-17126280
 ] 

Robert Burke commented on BEAM-9615:


State of the world slowed down progress on this, but I'm now rolling out PRs 
for review.

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects

2020-06-04 Thread Reza ardeshir rokni (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126261#comment-17126261
 ] 

Reza ardeshir rokni commented on BEAM-10199:


FYI [~reuvenlax],

> AutoValue Schema unable to work with auto-value-gson-annotations objects
> 
>
> Key: BEAM-10199
> URL: https://issues.apache.org/jira/browse/BEAM-10199
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Affects Versions: 2.21.0
>Reporter: Reza ardeshir rokni
>Priority: P2
>
> Objects created with the extension :
> com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0
> Fail with NPE in Java pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects

2020-06-04 Thread Reza ardeshir rokni (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza ardeshir rokni updated BEAM-10199:
---
Status: Open  (was: Triage Needed)

> AutoValue Schema unable to work with auto-value-gson-annotations objects
> 
>
> Key: BEAM-10199
> URL: https://issues.apache.org/jira/browse/BEAM-10199
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Affects Versions: 2.21.0
>Reporter: Reza ardeshir rokni
>Priority: P2
>
> Objects created with the extension :
> com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0
> Fail with NPE in Java pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects

2020-06-04 Thread Reza ardeshir rokni (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126260#comment-17126260
 ] 

Reza ardeshir rokni commented on BEAM-10199:


The GSON autovalue object actually extends a base object, the base object looks 
like the normal AutoValue generated object...
 
>From ErrorMsg class we get:
 
abstract class $AutoValue_ErrorMsg extends ErrorMsg 
 
And then:

final class AutoValue_ErrorMsg extends $AutoValue_ErrorMsg 
 
So this line of code in the Beam utils, wont work...:

return baseName.startsWith("AutoValue_") ? clazz.getSuperclass() : clazz;

> AutoValue Schema unable to work with auto-value-gson-annotations objects
> 
>
> Key: BEAM-10199
> URL: https://issues.apache.org/jira/browse/BEAM-10199
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Affects Versions: 2.21.0
>Reporter: Reza ardeshir rokni
>Priority: P2
>
> Objects created with the extension :
> com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0
> Fail with NPE in Java pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10199) AutoValue Schema unable to work with auto-value-gson-annotations objects

2020-06-04 Thread Reza ardeshir rokni (Jira)
Reza ardeshir rokni created BEAM-10199:
--

 Summary: AutoValue Schema unable to work with 
auto-value-gson-annotations objects
 Key: BEAM-10199
 URL: https://issues.apache.org/jira/browse/BEAM-10199
 Project: Beam
  Issue Type: New Feature
  Components: sdk-java-core
Affects Versions: 2.21.0
Reporter: Reza ardeshir rokni


Objects created with the extension :

com.ryanharter.auto.value:auto-value-gson-annotations:0.8.0

Fail with NPE in Java pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10189) Add ValueState to python sdk

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10189?focusedWorklogId=441570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441570
 ]

ASF GitHub Bot logged work on BEAM-10189:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 22:50
Start Date: 04/Jun/20 22:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11916:
URL: https://github.com/apache/beam/pull/11916#issuecomment-639157754


   Yes, the plan was to consider changing Java too, though that's harder due to 
backwards compatibility issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441570)
Time Spent: 50m  (was: 40m)

> Add ValueState to python sdk
> 
>
> Key: BEAM-10189
> URL: https://issues.apache.org/jira/browse/BEAM-10189
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441580=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441580
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 23:02
Start Date: 04/Jun/20 23:02
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #11820:
URL: https://github.com/apache/beam/pull/11820#discussion_r435598706



##
File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.nexmark.queries.zetasql;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.sql.SqlTransform;
+import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.nexmark.model.Bid;
+import org.apache.beam.sdk.nexmark.model.Event;
+import org.apache.beam.sdk.nexmark.model.Event.Type;
+import org.apache.beam.sdk.nexmark.model.sql.SelectEvent;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil;
+import org.apache.beam.sdk.schemas.transforms.Convert;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Filter;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Query 0: Pass events through unchanged.
+ *
+ * This measures the overhead of the Beam ZetaSql implementation and test 
harness like conversion
+ * from Java model classes to Beam records.
+ *
+ * {@link Bid} events are used here at the moment, ås they are most 
numerous with default
+ * configuration.
+ */
+public class ZetaSqlQuery0 extends NexmarkQueryTransform {
+
+  public ZetaSqlQuery0() {
+super("ZetaSqlQuery0");
+  }
+
+  @Override
+  public PCollection expand(PCollection allEvents) {
+PCollection rows =
+allEvents
+.apply(Filter.by(NexmarkQueryUtil.IS_BID))
+.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID));
+
+return rows.apply(getName() + ".Serialize", 
logBytesMetric(rows.getCoder()))
+.setRowSchema(rows.getSchema())
+.apply(
+SqlTransform.query("SELECT * FROM PCOLLECTION")
+.withQueryPlannerClass(ZetaSQLQueryPlanner.class))

Review comment:
   That sounds good to me. Then we can overload the methods that need 
customization.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441580)
Time Spent: 1h 50m  (was: 1h 40m)

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Most queries will be identical, but best to simply stay decoupled, so this is 
> a copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441567
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 22:48
Start Date: 04/Jun/20 22:48
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#11820:
URL: https://github.com/apache/beam/pull/11820#discussion_r435594096



##
File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.nexmark.queries.zetasql;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.sql.SqlTransform;
+import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.nexmark.model.Bid;
+import org.apache.beam.sdk.nexmark.model.Event;
+import org.apache.beam.sdk.nexmark.model.Event.Type;
+import org.apache.beam.sdk.nexmark.model.sql.SelectEvent;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil;
+import org.apache.beam.sdk.schemas.transforms.Convert;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Filter;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Query 0: Pass events through unchanged.
+ *
+ * This measures the overhead of the Beam ZetaSql implementation and test 
harness like conversion
+ * from Java model classes to Beam records.
+ *
+ * {@link Bid} events are used here at the moment, ås they are most 
numerous with default
+ * configuration.
+ */
+public class ZetaSqlQuery0 extends NexmarkQueryTransform {
+
+  public ZetaSqlQuery0() {
+super("ZetaSqlQuery0");
+  }
+
+  @Override
+  public PCollection expand(PCollection allEvents) {
+PCollection rows =
+allEvents
+.apply(Filter.by(NexmarkQueryUtil.IS_BID))
+.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID));
+
+return rows.apply(getName() + ".Serialize", 
logBytesMetric(rows.getCoder()))
+.setRowSchema(rows.getSchema())
+.apply(
+SqlTransform.query("SELECT * FROM PCOLLECTION")
+.withQueryPlannerClass(ZetaSQLQueryPlanner.class))

Review comment:
   Yea fair enough. When I started, I did not know how much things would 
differ. It was suggested that the dialects would not really be compatible, but 
actually they are. I stopped short of a real refactor but I'll go ahead with 
something now that it is working.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441567)
Time Spent: 1h 20m  (was: 1h 10m)

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Most queries will be identical, but best to simply stay decoupled, so this is 
> a copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7923) Interactive Beam

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7923?focusedWorklogId=441578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441578
 ]

ASF GitHub Bot logged work on BEAM-7923:


Author: ASF GitHub Bot
Created on: 04/Jun/20 23:01
Start Date: 04/Jun/20 23:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11884:
URL: https://github.com/apache/beam/pull/11884#issuecomment-639161004


   There's a lot of files there that don't seem relevant; I think we should go 
through and figure out what's needed for the actual plugin vs. what's "extras" 
that just got copied from a template. We also need to figure out the 
distribution story. Will this be released with beam? As another pypi pacakge 
(and npm package)? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441578)
Time Spent: 23h 40m  (was: 23.5h)

> Interactive Beam
> 
>
> Key: BEAM-7923
> URL: https://issues.apache.org/jira/browse/BEAM-7923
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 23h 40m
>  Remaining Estimate: 0h
>
> This is the top level ticket for all efforts leveraging [interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
> As the development goes, blocking tickets will be added to this one.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441577
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 22:58
Start Date: 04/Jun/20 22:58
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#11820:
URL: https://github.com/apache/beam/pull/11820#discussion_r435597049



##
File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.nexmark.queries.zetasql;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.sql.SqlTransform;
+import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.nexmark.model.Bid;
+import org.apache.beam.sdk.nexmark.model.Event;
+import org.apache.beam.sdk.nexmark.model.Event.Type;
+import org.apache.beam.sdk.nexmark.model.sql.SelectEvent;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil;
+import org.apache.beam.sdk.schemas.transforms.Convert;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Filter;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Query 0: Pass events through unchanged.
+ *
+ * This measures the overhead of the Beam ZetaSql implementation and test 
harness like conversion
+ * from Java model classes to Beam records.
+ *
+ * {@link Bid} events are used here at the moment, ås they are most 
numerous with default
+ * configuration.
+ */
+public class ZetaSqlQuery0 extends NexmarkQueryTransform {
+
+  public ZetaSqlQuery0() {
+super("ZetaSqlQuery0");
+  }
+
+  @Override
+  public PCollection expand(PCollection allEvents) {
+PCollection rows =
+allEvents
+.apply(Filter.by(NexmarkQueryUtil.IS_BID))
+.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID));
+
+return rows.apply(getName() + ".Serialize", 
logBytesMetric(rows.getCoder()))
+.setRowSchema(rows.getSchema())
+.apply(
+SqlTransform.query("SELECT * FROM PCOLLECTION")
+.withQueryPlannerClass(ZetaSQLQueryPlanner.class))

Review comment:
   In my mental model, there are _n_ Nexmark queries with common setup and 
teardown, and then there are _m_ suites that implement the queries, filling in 
the middle part. I do want to keep them slightly decoupled in case some of them 
need hackery. They turned out not to require tweaking, but I'm not super 
convinced that isn't coincidence.
   
   So the refactor I have in mind is to provide the scaffolding for each query 
but still having a pure java, Calcite, and ZetaSQL suite of classes. The 
Calcite and ZetaSQL bits would share a lot and/or be one liners.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441577)
Time Spent: 1h 40m  (was: 1.5h)

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>  

[jira] [Work logged] (BEAM-10093) Add ZetaSQL Nexmark variant

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10093?focusedWorklogId=441576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441576
 ]

ASF GitHub Bot logged work on BEAM-10093:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 22:57
Start Date: 04/Jun/20 22:57
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on a change in pull request 
#11820:
URL: https://github.com/apache/beam/pull/11820#discussion_r435597049



##
File path: 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/zetasql/ZetaSqlQuery0.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.nexmark.queries.zetasql;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.sql.SqlTransform;
+import org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner;
+import org.apache.beam.sdk.metrics.Counter;
+import org.apache.beam.sdk.metrics.Metrics;
+import org.apache.beam.sdk.nexmark.model.Bid;
+import org.apache.beam.sdk.nexmark.model.Event;
+import org.apache.beam.sdk.nexmark.model.Event.Type;
+import org.apache.beam.sdk.nexmark.model.sql.SelectEvent;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryTransform;
+import org.apache.beam.sdk.nexmark.queries.NexmarkQueryUtil;
+import org.apache.beam.sdk.schemas.transforms.Convert;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.Filter;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * Query 0: Pass events through unchanged.
+ *
+ * This measures the overhead of the Beam ZetaSql implementation and test 
harness like conversion
+ * from Java model classes to Beam records.
+ *
+ * {@link Bid} events are used here at the moment, ås they are most 
numerous with default
+ * configuration.
+ */
+public class ZetaSqlQuery0 extends NexmarkQueryTransform {
+
+  public ZetaSqlQuery0() {
+super("ZetaSqlQuery0");
+  }
+
+  @Override
+  public PCollection expand(PCollection allEvents) {
+PCollection rows =
+allEvents
+.apply(Filter.by(NexmarkQueryUtil.IS_BID))
+.apply(getName() + ".SelectEvent", new SelectEvent(Type.BID));
+
+return rows.apply(getName() + ".Serialize", 
logBytesMetric(rows.getCoder()))
+.setRowSchema(rows.getSchema())
+.apply(
+SqlTransform.query("SELECT * FROM PCOLLECTION")
+.withQueryPlannerClass(ZetaSQLQueryPlanner.class))

Review comment:
   In my mental model, there are  Nexmark queries with common setup and 
teardown, and then there are  suites that do some processing on them. I do 
want to keep them slightly decoupled in case some of them need hackery. They 
turned out not to require tweaking, but I'm not super convinced that isn't 
coincidence.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441576)
Time Spent: 1.5h  (was: 1h 20m)

> Add ZetaSQL Nexmark variant
> ---
>
> Key: BEAM-10093
> URL: https://issues.apache.org/jira/browse/BEAM-10093
> Project: Beam
>  Issue Type: New Feature
>  Components: testing-nexmark
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: P2
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Most queries will be identical, but best to simply stay decoupled, so this is 
> a copy/paste/modify job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7074) FnApiRunner fails to wire multiple timer specs in single pardo

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7074?focusedWorklogId=441572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441572
 ]

ASF GitHub Bot logged work on BEAM-7074:


Author: ASF GitHub Bot
Created on: 04/Jun/20 22:52
Start Date: 04/Jun/20 22:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #11894:
URL: https://github.com/apache/beam/pull/11894#discussion_r435595283



##
File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py
##
@@ -377,11 +377,6 @@ def process_timer(
   assert_that(actual, equal_to(expected))
 
   def test_pardo_timers_clear(self):
-if type(self).__name__ != 'FlinkRunnerTest':

Review comment:
   I think it would be cleaner to add an override in the subclasses than 
test on `__name__` here. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441572)
Time Spent: 1h 50m  (was: 1h 40m)

> FnApiRunner fails to wire multiple timer specs in single pardo
> --
>
> Key: BEAM-7074
> URL: https://issues.apache.org/jira/browse/BEAM-7074
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.11.0
>Reporter: Thomas Weise
>Priority: P2
>  Labels: stale-P2
> Fix For: 2.21.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Multiple timer specs in a ParDo yield "NotImplementedError: Timers and side 
> inputs."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10198) IOException for distinct bytes SQL query

2020-06-04 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-10198:

Labels: beam-fixit  (was: )

> IOException for distinct bytes SQL query
> 
>
> Key: BEAM-10198
> URL: https://issues.apache.org/jira/browse/BEAM-10198
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>  Labels: beam-fixit
>
> This query breaks Java code import. Need to diagnose the root cause and have 
> a fix properly. 
> {code:java}
>  public void test_distinct_bytes() {
> String sql = "SELECT DISTINCT val.BYTES\n"
> + "from (select b\"1\" BYTES union all\n"
> + "  select cast(NULL as bytes) union all\n"
> + "  select b\"-1\" union all\n"
> + "  select b\"1\" union all\n"
> + "  select cast(NULL as bytes)) val";
> ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
> BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
> PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
> Schema singleField = Schema.builder().addByteArrayField("field1").build();
> PAssert.that(stream)
> .containsInAnyOrder(
> 
> Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());
> 
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>   }
> {code}
> This query will throws 
> {code:java}
> Forbidden IOException when reading from InputStream
> java.lang.IllegalArgumentException: Forbidden IOException when reading from 
> InputStream
>   at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118)
>   at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98)
>   at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
>   at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10198) IOException for distinct bytes SQL query

2020-06-04 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-10198:

Description: 
This query breaks Java code import. Need to diagnose the root cause and have a 
fix properly. 


{code:java}
 public void test_distinct_bytes() {
String sql = "SELECT DISTINCT val.BYTES\n"
+ "from (select b\"1\" BYTES union all\n"
+ "  select cast(NULL as bytes) union all\n"
+ "  select b\"-1\" union all\n"
+ "  select b\"1\" union all\n"
+ "  select cast(NULL as bytes)) val";

ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
beamRelNode);

Schema singleField = Schema.builder().addByteArrayField("field1").build();
PAssert.that(stream)
.containsInAnyOrder(

Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());

pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
  }
{code}

This query will throws 

{code:java}
Forbidden IOException when reading from InputStream
java.lang.IllegalArgumentException: Forbidden IOException when reading from 
InputStream
at 
org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118)
at 
org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98)
at 
org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141)
{code}




  was:

{code:java}
 public void test_distinct_bytes() {
String sql = "SELECT DISTINCT val.BYTES\n"
+ "from (select b\"1\" BYTES union all\n"
+ "  select cast(NULL as bytes) union all\n"
+ "  select b\"-1\" union all\n"
+ "  select b\"1\" union all\n"
+ "  select cast(NULL as bytes)) val";

ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
beamRelNode);

Schema singleField = Schema.builder().addByteArrayField("field1").build();
PAssert.that(stream)
.containsInAnyOrder(

Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());

pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
  }
{code}


This query breaks Java code import. Need to diagnose the root cause and have a 
fix properly. 


> IOException for distinct bytes SQL query
> 
>
> Key: BEAM-10198
> URL: https://issues.apache.org/jira/browse/BEAM-10198
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>
> This query breaks Java code import. Need to diagnose the root cause and have 
> a fix properly. 
> {code:java}
>  public void test_distinct_bytes() {
> String sql = "SELECT DISTINCT val.BYTES\n"
> + "from (select b\"1\" BYTES union all\n"
> + "  select cast(NULL as bytes) union all\n"
> + "  select b\"-1\" union all\n"
> + "  select b\"1\" union all\n"
> + "  select cast(NULL as bytes)) val";
> ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
> BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
> PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
> Schema singleField = Schema.builder().addByteArrayField("field1").build();
> PAssert.that(stream)
> .containsInAnyOrder(
> 
> Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());
> 
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>   }
> {code}
> This query will throws 
> {code:java}
> Forbidden IOException when reading from InputStream
> java.lang.IllegalArgumentException: Forbidden IOException when reading from 
> InputStream
>   at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromSafeStream(CoderUtils.java:118)
>   at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:98)
>   at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
>   at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10198) Program crashes for distinct bytes SQL query

2020-06-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10198:
---
Status: Open  (was: Triage Needed)

> Program crashes for distinct bytes SQL query
> 
>
> Key: BEAM-10198
> URL: https://issues.apache.org/jira/browse/BEAM-10198
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>
> {code:java}
>  public void test_distinct_bytes() {
> String sql = "SELECT DISTINCT val.BYTES\n"
> + "from (select b\"1\" BYTES union all\n"
> + "  select cast(NULL as bytes) union all\n"
> + "  select b\"-1\" union all\n"
> + "  select b\"1\" union all\n"
> + "  select cast(NULL as bytes)) val";
> ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
> BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
> PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
> Schema singleField = Schema.builder().addByteArrayField("field1").build();
> PAssert.that(stream)
> .containsInAnyOrder(
> 
> Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());
> 
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>   }
> {code}
> This query breaks Java code import. Need to diagnose the root cause and have 
> a fix properly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10198) Program crashes for distinct bytes SQL query

2020-06-04 Thread Rui Wang (Jira)
Rui Wang created BEAM-10198:
---

 Summary: Program crashes for distinct bytes SQL query
 Key: BEAM-10198
 URL: https://issues.apache.org/jira/browse/BEAM-10198
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang



{code:java}
 public void test_distinct_bytes() {
String sql = "SELECT DISTINCT val.BYTES\n"
+ "from (select b\"1\" BYTES union all\n"
+ "  select cast(NULL as bytes) union all\n"
+ "  select b\"-1\" union all\n"
+ "  select b\"1\" union all\n"
+ "  select cast(NULL as bytes)) val";

ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
beamRelNode);

Schema singleField = Schema.builder().addByteArrayField("field1").build();
PAssert.that(stream)
.containsInAnyOrder(

Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());

pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
  }
{code}


This query breaks Java code import. Need to diagnose the root cause and have a 
fix properly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10198) IOException for distinct bytes SQL query

2020-06-04 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-10198:

Summary: IOException for distinct bytes SQL query  (was: Program crashes 
for distinct bytes SQL query)

> IOException for distinct bytes SQL query
> 
>
> Key: BEAM-10198
> URL: https://issues.apache.org/jira/browse/BEAM-10198
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: P2
>
> {code:java}
>  public void test_distinct_bytes() {
> String sql = "SELECT DISTINCT val.BYTES\n"
> + "from (select b\"1\" BYTES union all\n"
> + "  select cast(NULL as bytes) union all\n"
> + "  select b\"-1\" union all\n"
> + "  select b\"1\" union all\n"
> + "  select cast(NULL as bytes)) val";
> ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);
> BeamRelNode beamRelNode = zetaSQLQueryPlanner.convertToBeamRel(sql);
> PCollection stream = BeamSqlRelUtils.toPCollection(pipeline, 
> beamRelNode);
> Schema singleField = Schema.builder().addByteArrayField("field1").build();
> PAssert.that(stream)
> .containsInAnyOrder(
> 
> Row.withSchema(singleField).addValues("123".getBytes(UTF_8)).build());
> 
> pipeline.run().waitUntilFinish(Duration.standardMinutes(PIPELINE_EXECUTION_WAITTIME_MINUTES));
>   }
> {code}
> This query breaks Java code import. Need to diagnose the root cause and have 
> a fix properly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10197) Support type hints for frozenset

2020-06-04 Thread Saavan Nanavati (Jira)
Saavan Nanavati created BEAM-10197:
--

 Summary: Support type hints for frozenset 
 Key: BEAM-10197
 URL: https://issues.apache.org/jira/browse/BEAM-10197
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Saavan Nanavati


Beam's internal typing system currently supports type hints for set but not 
frozenset.

 

This Jira ticket will add type annotation support for both frozenset and 
typing.FrozenSet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9953) Beam ZetaSQL supports multiple statements in a query

2020-06-04 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126232#comment-17126232
 ] 

Rui Wang commented on BEAM-9953:


Better to check the open sourced Java Analyzer: 
https://github.com/google/zetasql/blob/master/java/com/google/zetasql/Analyzer.java


I am not seeing a API that returns a ResolvedStatment but that is not really 
analyzing. 

Also Kyle added an API discussed above (which we need) at: 
https://github.com/google/zetasql/blob/master/java/com/google/zetasql/Analyzer.java#L214.
 So it's no longer a problem

> Beam ZetaSQL supports multiple statements in a query
> 
>
> Key: BEAM-9953
> URL: https://issues.apache.org/jira/browse/BEAM-9953
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kyle Weaver
>Priority: P2
>
> One example of multiple  statements query:
> {code:java}
> CREATE FUNCTION fun_a (param_1 INT64); SELECT fun_a(10);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9953) Beam ZetaSQL supports multiple statements in a query

2020-06-04 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126229#comment-17126229
 ] 

Kenneth Knowles commented on BEAM-9953:
---

Is there a parse-by-don't-analyze that makes it easy to identify the statement 
type? (obviously internal to the library, there is)

> Beam ZetaSQL supports multiple statements in a query
> 
>
> Key: BEAM-9953
> URL: https://issues.apache.org/jira/browse/BEAM-9953
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kyle Weaver
>Priority: P2
>
> One example of multiple  statements query:
> {code:java}
> CREATE FUNCTION fun_a (param_1 INT64); SELECT fun_a(10);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-8828:
---

Assignee: Scott Lukas

> BigQueryTableProvider should allow configuration of write disposition
> -
>
> Key: BEAM-8828
> URL: https://issues.apache.org/jira/browse/BEAM-8828
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Scott Lukas
>Priority: P2
>
> It should be possible to set BigQueryIO's 
> [writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
>  in a Beam SQL big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441527
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 04/Jun/20 21:20
Start Date: 04/Jun/20 21:20
Worklog Time Spent: 10m 
  Work Description: lostluck opened a new pull request #11926:
URL: https://github.com/apache/beam/pull/11926


   Adds a unit test for the "user side" type encoders /decoders. This will help 
avoiding having a broken state as we change the string coder, and default 
custom type coders.
   
   Also adjusts the coder print out a little for improved readability (and 
reduced redundancy).
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441529
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 04/Jun/20 21:20
Start Date: 04/Jun/20 21:20
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11926:
URL: https://github.com/apache/beam/pull/11926#issuecomment-639122665


   R: @tysonjh 
   cc: @youngoli 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441529)
Time Spent: 40m  (was: 0.5h)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441523
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 04/Jun/20 21:13
Start Date: 04/Jun/20 21:13
Worklog Time Spent: 10m 
  Work Description: lostluck commented on pull request #11925:
URL: https://github.com/apache/beam/pull/11925#issuecomment-639119430


   R: @tysonjh 
   cc: @youngoli 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441523)
Time Spent: 20m  (was: 10m)

> [Go SDK] Beam Schemas
> -
>
> Key: BEAM-9615
> URL: https://issues.apache.org/jira/browse/BEAM-9615
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Schema support is required for advanced cross language features in Beam, and 
> has the opportunity to replace the current default JSON encoding of elements.
> Some quick notes, though a better fleshed out doc with details will be 
> forthcoming:
>  * All base coders should be implemented, and listed as coder capabilities. I 
> think only stringutf8 is missing presently.
>  * Should support fairly arbitrary user types, seamlessly. That is, users 
> should be able to rely on it "just working" if their type is compatible.
>  * Should support schema metadata tagging.
> In particular, one breaking shift in the default will be to explicitly fail 
> pipelines if elements have unexported fields, when no other custom coder has 
> been added. This has been a source of errors/dropped data/keys and a simply 
> warning at construction time won't cut it. However, we could provide a manual 
> "use beam schemas, but ignore unexported fields" registration as a work 
> around.
> Edit: Doc is now at https://s.apache.org/beam-go-schemas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9615) [Go SDK] Beam Schemas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9615?focusedWorklogId=441522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441522
 ]

ASF GitHub Bot logged work on BEAM-9615:


Author: ASF GitHub Bot
Created on: 04/Jun/20 21:13
Start Date: 04/Jun/20 21:13
Worklog Time Spent: 10m 
  Work Description: lostluck opened a new pull request #11925:
URL: https://github.com/apache/beam/pull/11925


   This adds initial utility functions for encoding and decoding utf8 strings 
in the Go SDK.
   
   Doesn't make use of them yet. In practice this is already how strings are 
encoded in the Go SDK, but marked as "custom" coders rather than the built in 
URN.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441518
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 04/Jun/20 21:02
Start Date: 04/Jun/20 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11922:
URL: https://github.com/apache/beam/pull/11922#discussion_r435546832



##
File path: 
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##
@@ -1593,6 +1601,391 @@ public void 
testProcessElementForSizedElementAndRestriction() throws Exception {
 assertEquals(stateData, fakeClient.getData());
   }
 
+  @Test
+  public void testProcessElementForWindowedSizedElementAndRestriction() throws 
Exception {
+Pipeline p = Pipeline.create();
+PCollection valuePCollection = p.apply(Create.of("unused"));
+PCollectionView singletonSideInputView = 
valuePCollection.apply(View.asSingleton());
+TestSplittableDoFn doFn = new TestSplittableDoFn(singletonSideInputView);
+
+valuePCollection
+.apply(Window.into(SlidingWindows.of(Duration.standardSeconds(1
+.apply(TEST_TRANSFORM_ID, 
ParDo.of(doFn).withSideInputs(singletonSideInputView));
+
+RunnerApi.Pipeline pProto =
+ProtoOverrides.updateTransform(
+PTransformTranslation.PAR_DO_TRANSFORM_URN,
+PipelineTranslation.toProto(p, 
SdkComponents.create(p.getOptions()), true),
+SplittableParDoExpander.createSizedReplacement());
+String expandedTransformId =
+Iterables.find(
+pProto.getComponents().getTransformsMap().entrySet(),
+entry ->
+entry
+.getValue()
+.getSpec()
+.getUrn()
+.equals(
+PTransformTranslation
+
.SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN)
+&& 
entry.getValue().getUniqueName().contains(TEST_TRANSFORM_ID))
+.getKey();
+RunnerApi.PTransform pTransform =
+pProto.getComponents().getTransformsOrThrow(expandedTransformId);
+String inputPCollectionId =
+
pTransform.getInputsOrThrow(ParDoTranslation.getMainInputName(pTransform));
+RunnerApi.PCollection inputPCollection =
+pProto.getComponents().getPcollectionsOrThrow(inputPCollectionId);
+RehydratedComponents rehydratedComponents =
+RehydratedComponents.forComponents(pProto.getComponents());
+Coder inputCoder =
+WindowedValue.getFullCoder(
+CoderTranslation.fromProto(
+
pProto.getComponents().getCodersOrThrow(inputPCollection.getCoderId()),
+rehydratedComponents,
+TranslationContext.DEFAULT),
+(Coder)
+CoderTranslation.fromProto(
+pProto
+.getComponents()
+.getCodersOrThrow(
+pProto
+.getComponents()
+.getWindowingStrategiesOrThrow(
+inputPCollection.getWindowingStrategyId())
+.getWindowCoderId()),
+rehydratedComponents,
+TranslationContext.DEFAULT));
+String outputPCollectionId = pTransform.getOutputsOrThrow("output");
+
+ImmutableMap stateData =
+ImmutableMap.of(
+
multimapSideInputKey(singletonSideInputView.getTagInternal().getId(), 
ByteString.EMPTY),
+encode("8"));
+
+FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData);
+
+List> mainOutputValues = new ArrayList<>();
+MetricsContainerStepMap metricsContainerRegistry = new 
MetricsContainerStepMap();
+PCollectionConsumerRegistry consumers =
+new PCollectionConsumerRegistry(
+metricsContainerRegistry, mock(ExecutionStateTracker.class));
+consumers.register(
+outputPCollectionId,
+TEST_TRANSFORM_ID,
+(FnDataReceiver) (FnDataReceiver>) 
mainOutputValues::add);
+PTransformFunctionRegistry startFunctionRegistry =
+new PTransformFunctionRegistry(
+mock(MetricsContainerStepMap.class), 
mock(ExecutionStateTracker.class), "start");
+PTransformFunctionRegistry finishFunctionRegistry =
+new PTransformFunctionRegistry(
+mock(MetricsContainerStepMap.class), 
mock(ExecutionStateTracker.class), "finish");
+List teardownFunctions = new ArrayList<>();
+List progressRequestCallbacks = new ArrayList<>();
+BundleSplitListener.InMemory splitListener = 
BundleSplitListener.InMemory.create();
+
+new 

[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-06-04 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126186#comment-17126186
 ] 

Kenneth Knowles commented on BEAM-9198:
---

Sounds good. By the way, that message was automated. I messed up the automation 
so it looked like it was from me. If it happens again, it will be clear that it 
is a bot. Carry on!

> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: P2
>  Labels: gsoc, gsoc2020, mentor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mentor email: ruw...@google.com. Feel free to send emails for your questions.
> Project Information
> -
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> As there is a long list of analytics functions, a good start point is support 
> rank() first.
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Enable in ZetaSQL dialect.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 
> https://beam.apache.org/documentation/programming-guide/#overview



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9363) BeamSQL windowing as TVF

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9363?focusedWorklogId=441511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441511
 ]

ASF GitHub Bot logged work on BEAM-9363:


Author: ASF GitHub Bot
Created on: 04/Jun/20 20:54
Start Date: 04/Jun/20 20:54
Worklog Time Spent: 10m 
  Work Description: apilloud commented on a change in pull request #11868:
URL: https://github.com/apache/beam/pull/11868#discussion_r435497089



##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/TVFSlidingWindowFn.java
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Objects;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.extensions.sql.impl.utils.TVFStreamingUtils;
+import org.apache.beam.sdk.transforms.windowing.IntervalWindow;
+import org.apache.beam.sdk.transforms.windowing.NonMergingWindowFn;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.transforms.windowing.WindowFn.AssignContext;
+import org.apache.beam.sdk.transforms.windowing.WindowMappingFn;
+import org.apache.beam.sdk.values.Row;
+import org.joda.time.Duration;
+
+/**
+ * TVFSlidingWindowFn assigns window based on input row's "window_start" and 
"window_end"
+ * timestamps.
+ */
+public class TVFSlidingWindowFn extends NonMergingWindowFn {
+  /** Amount of time between generated windows. */
+  private final Duration period;
+
+  /** Size of the generated windows. */
+  private final Duration size;
+
+  public static TVFSlidingWindowFn of(Duration size, Duration period) {
+return new TVFSlidingWindowFn(size, period);
+  }
+
+  private TVFSlidingWindowFn(Duration size, Duration period) {
+this.period = period;
+this.size = size;
+  }
+
+  @Override
+  public Collection assignWindows(AssignContext c) throws 
Exception {
+Row curRow = (Row) c.element();
+// In sliding window as TVF syntax, each row contains's its window's start 
and end as metadata,
+// thus we can assign a window directly based on window's start and end 
metadata.
+return Arrays.asList(
+new IntervalWindow(
+curRow.getDateTime(TVFStreamingUtils.WINDOW_START).toInstant(),
+curRow.getDateTime(TVFStreamingUtils.WINDOW_END).toInstant()));
+  }
+
+  @Override
+  public boolean isCompatible(WindowFn other) {
+return equals(other);
+  }
+
+  @Override
+  public Coder windowCoder() {
+return IntervalWindow.getCoder();
+  }
+
+  @Override
+  public WindowMappingFn getDefaultWindowMappingFn() {
+throw new UnsupportedOperationException(
+"TVFSlidingWindow does not support side input windows.");
+  }
+
+  @Override

Review comment:
   nit: From here to the end of the file is boilerplate that `AutoValue` 
does for you. Could you use that instead?

##
File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##
@@ -85,6 +97,85 @@ public TableFunctionScan copy(
   }
 
   private class Transform extends PTransform, 
PCollection> {
+private TVFToPTransform tumbleToPTransform =
+(call, upstream) -> {
+  RexInputRef wmCol = (RexInputRef) call.getOperands().get(1);
+  Schema outputSchema = CalciteUtils.toSchema(getRowType());
+  FixedWindows windowFn = 
FixedWindows.of(durationParameter(call.getOperands().get(2)));
+  PCollection streamWithWindowMetadata =
+  upstream
+  .apply(ParDo.of(new FixedWindowDoFn(windowFn, 
wmCol.getIndex(), outputSchema)))
+  .setRowSchema(outputSchema);
+
+  PCollection windowedStream =
+  assignTimestampsAndWindow(
+  streamWithWindowMetadata, wmCol.getIndex(), (WindowFn) 
windowFn);
+
+  return windowedStream;
+};
+
+private TVFToPTransform hopToPTransform =
+(call, upstream) -> {
+  RexInputRef wmCol = (RexInputRef) 

[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=441510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441510
 ]

ASF GitHub Bot logged work on BEAM-10097:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 20:49
Start Date: 04/Jun/20 20:49
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-639108631


   R: @Ardagan @amaliujia 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441510)
Time Spent: 4h 40m  (was: 4.5h)

> Migrate PCollection views to use both iterable and multimap 
> materializations/access patterns
> 
>
> Key: BEAM-10097
> URL: https://issues.apache.org/jira/browse/BEAM-10097
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, 
> map, multimap.
> We should be using the primitive views (iterable, multimap) directly without 
> going through the naive mapping.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10192) Duplicate PubSub subscriptions with python direct runner in Jupyter/Colab environment

2020-06-04 Thread Harrison (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126174#comment-17126174
 ] 

Harrison commented on BEAM-10192:
-

This may be related to BEAM-2988 

> Duplicate PubSub subscriptions with python direct runner in Jupyter/Colab 
> environment
> -
>
> Key: BEAM-10192
> URL: https://issues.apache.org/jira/browse/BEAM-10192
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.22.0
> Environment: Ubuntu 18 (Colab notebook), Python SDK
>Reporter: Harrison
>Priority: P2
>
> When running a streaming pipeline on Colab with direct runner, ReadFromPubSub 
> can retain old subscriptions and cause message duplication. For example, 
> manually killing a cell that is running a streaming pubsub pipeline does not 
> delete the pubsub subscription. If the cell is rerun, the ReadFromPubSub 
> component will actually be subscribed twice which results in duplicate 
> messages.
> Manually deleting old subscriptions (e.g. via the GCP dashboard) temporarily 
> fixes the problem.
> This Colab notebook: 
> [https://gist.github.com/hgarrereyn/64ce87cbbcbe9c34ccdd13eafe49e3fb] 
> contains a runnable example of the bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=441508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441508
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 04/Jun/20 20:27
Start Date: 04/Jun/20 20:27
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-639098591


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441508)
Time Spent: 3h 20m  (was: 3h 10m)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-10188) Automate Github release

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-10188.

Fix Version/s: Not applicable
   Resolution: Fixed

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10188) Automate Github release

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-10188:


Assignee: (was: Brian Hulette)

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10188) Automate Github release

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441507
 ]

ASF GitHub Bot logged work on BEAM-10188:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 20:21
Start Date: 04/Jun/20 20:21
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit merged pull request #11918:
URL: https://github.com/apache/beam/pull/11918


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441507)
Time Spent: 40m  (was: 0.5h)

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10188) Automate Github release

2020-06-04 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-10188:


Assignee: Brian Hulette

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Assignee: Brian Hulette
>Priority: P2
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10196) [beam_PostCommit_Java_ValidatesRunner_Spark] [TimerTests.testOutputTimestampDefault]

2020-06-04 Thread Rui Wang (Jira)
Rui Wang created BEAM-10196:
---

 Summary: [beam_PostCommit_Java_ValidatesRunner_Spark] 
[TimerTests.testOutputTimestampDefault]
 Key: BEAM-10196
 URL: https://issues.apache.org/jira/browse/BEAM-10196
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Rui Wang


 * 
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/7516/testReport/


Initial investigation:
Timer is not fully supported by Spark runner.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-06-04 Thread John Mora (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126168#comment-17126168
 ] 

John Mora commented on BEAM-9198:
-

Hi [~kenn]

I am still working on this issue. I am on the stage 2. "SQL core to implement 
physical relational operator" of my GSoC proposal. I have been reporting my 
progress to my mentor [~amaliujia] . Additionally, I have sent a PR with an 
experiment in order to receive feedback, a few days ago.

Regards,
John

> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: John Mora
>Priority: P2
>  Labels: gsoc, gsoc2020, mentor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Mentor email: ruw...@google.com. Feel free to send emails for your questions.
> Project Information
> -
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> As there is a long list of analytics functions, a good start point is support 
> rank() first.
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Enable in ZetaSQL dialect.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 
> https://beam.apache.org/documentation/programming-guide/#overview



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441505
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 04/Jun/20 20:12
Start Date: 04/Jun/20 20:12
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #11922:
URL: https://github.com/apache/beam/pull/11922#discussion_r435521657



##
File path: 
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
##
@@ -1593,6 +1601,391 @@ public void 
testProcessElementForSizedElementAndRestriction() throws Exception {
 assertEquals(stateData, fakeClient.getData());
   }
 
+  @Test
+  public void testProcessElementForWindowedSizedElementAndRestriction() throws 
Exception {
+Pipeline p = Pipeline.create();
+PCollection valuePCollection = p.apply(Create.of("unused"));
+PCollectionView singletonSideInputView = 
valuePCollection.apply(View.asSingleton());
+TestSplittableDoFn doFn = new TestSplittableDoFn(singletonSideInputView);
+
+valuePCollection
+.apply(Window.into(SlidingWindows.of(Duration.standardSeconds(1
+.apply(TEST_TRANSFORM_ID, 
ParDo.of(doFn).withSideInputs(singletonSideInputView));
+
+RunnerApi.Pipeline pProto =
+ProtoOverrides.updateTransform(
+PTransformTranslation.PAR_DO_TRANSFORM_URN,
+PipelineTranslation.toProto(p, 
SdkComponents.create(p.getOptions()), true),
+SplittableParDoExpander.createSizedReplacement());
+String expandedTransformId =
+Iterables.find(
+pProto.getComponents().getTransformsMap().entrySet(),
+entry ->
+entry
+.getValue()
+.getSpec()
+.getUrn()
+.equals(
+PTransformTranslation
+
.SPLITTABLE_PROCESS_SIZED_ELEMENTS_AND_RESTRICTIONS_URN)
+&& 
entry.getValue().getUniqueName().contains(TEST_TRANSFORM_ID))
+.getKey();
+RunnerApi.PTransform pTransform =
+pProto.getComponents().getTransformsOrThrow(expandedTransformId);
+String inputPCollectionId =
+
pTransform.getInputsOrThrow(ParDoTranslation.getMainInputName(pTransform));
+RunnerApi.PCollection inputPCollection =
+pProto.getComponents().getPcollectionsOrThrow(inputPCollectionId);
+RehydratedComponents rehydratedComponents =
+RehydratedComponents.forComponents(pProto.getComponents());
+Coder inputCoder =
+WindowedValue.getFullCoder(
+CoderTranslation.fromProto(
+
pProto.getComponents().getCodersOrThrow(inputPCollection.getCoderId()),
+rehydratedComponents,
+TranslationContext.DEFAULT),
+(Coder)
+CoderTranslation.fromProto(
+pProto
+.getComponents()
+.getCodersOrThrow(
+pProto
+.getComponents()
+.getWindowingStrategiesOrThrow(
+inputPCollection.getWindowingStrategyId())
+.getWindowCoderId()),
+rehydratedComponents,
+TranslationContext.DEFAULT));
+String outputPCollectionId = pTransform.getOutputsOrThrow("output");
+
+ImmutableMap stateData =
+ImmutableMap.of(
+
multimapSideInputKey(singletonSideInputView.getTagInternal().getId(), 
ByteString.EMPTY),
+encode("8"));
+
+FakeBeamFnStateClient fakeClient = new FakeBeamFnStateClient(stateData);
+
+List> mainOutputValues = new ArrayList<>();
+MetricsContainerStepMap metricsContainerRegistry = new 
MetricsContainerStepMap();
+PCollectionConsumerRegistry consumers =
+new PCollectionConsumerRegistry(
+metricsContainerRegistry, mock(ExecutionStateTracker.class));
+consumers.register(
+outputPCollectionId,
+TEST_TRANSFORM_ID,
+(FnDataReceiver) (FnDataReceiver>) 
mainOutputValues::add);
+PTransformFunctionRegistry startFunctionRegistry =
+new PTransformFunctionRegistry(
+mock(MetricsContainerStepMap.class), 
mock(ExecutionStateTracker.class), "start");
+PTransformFunctionRegistry finishFunctionRegistry =
+new PTransformFunctionRegistry(
+mock(MetricsContainerStepMap.class), 
mock(ExecutionStateTracker.class), "finish");
+List teardownFunctions = new ArrayList<>();
+List progressRequestCallbacks = new ArrayList<>();
+BundleSplitListener.InMemory splitListener = 
BundleSplitListener.InMemory.create();
+
+new 

[jira] [Assigned] (BEAM-10195) [beam_PostCommit_Java_ValidatesRunner_Samza] [ org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault]

2020-06-04 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang reassigned BEAM-10195:
---

Assignee: Xinyu Liu

> [beam_PostCommit_Java_ValidatesRunner_Samza] [ 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault]
> --
>
> Key: BEAM-10195
> URL: https://issues.apache.org/jira/browse/BEAM-10195
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Rui Wang
>Assignee: Xinyu Liu
>Priority: P0
>  Labels: currently-failing
>
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/6698/testReport/
> Initial investigation:
> TimerTests is not fully supported by Samza runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9008) Add readAll() method to CassandraIO

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9008?focusedWorklogId=441503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441503
 ]

ASF GitHub Bot logged work on BEAM-9008:


Author: ASF GitHub Bot
Created on: 04/Jun/20 20:11
Start Date: 04/Jun/20 20:11
Worklog Time Spent: 10m 
  Work Description: vmarquez commented on a change in pull request #10546:
URL: https://github.com/apache/beam/pull/10546#discussion_r434870508



##
File path: 
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraIO.java
##
@@ -1170,4 +898,44 @@ private void waitForFuturesToFinish() throws 
ExecutionException, InterruptedExce
   }
 }
   }
+
+  /**
+   * A {@link PTransform} to read data from Apache Cassandra. See {@link 
CassandraIO} for more
+   * information on usage and configuration.
+   */
+  @AutoValue
+  public abstract static class ReadAll extends 
PTransform>, PCollection> {
+
+@Nullable
+abstract Coder coder();
+
+abstract ReadAll.Builder builder();
+
+/** Specify the {@link Coder} used to serialize the entity in the {@link 
PCollection}. */
+public ReadAll withCoder(Coder coder) {

Review comment:
   I think we do, but am not sure.  You have to call `setCoder` on the 
PCollection itself, so we don't have access to the `PCollection>` at a 
point when we also have access to a single `Read` (they are only supplied in 
the `ReadFn` which can't call `setCoder` on the PTransform).  Is my thinking 
correct?  I could be unaware of a different way to do this.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441503)
Time Spent: 13.5h  (was: 13h 20m)

> Add readAll() method to CassandraIO
> ---
>
> Key: BEAM-9008
> URL: https://issues.apache.org/jira/browse/BEAM-9008
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-cassandra
>Affects Versions: 2.16.0
>Reporter: vincent marquez
>Assignee: vincent marquez
>Priority: P3
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> When querying a large cassandra database, it's often *much* more useful to 
> programatically generate the queries needed to to be run rather than reading 
> all partitions and attempting some filtering.  
> As an example:
> {code:java}
> public class Event { 
>@PartitionKey(0) public UUID accountId;
>@PartitionKey(1)public String yearMonthDay; 
>@ClusteringKey public UUID eventId;  
>//other data...
> }{code}
> If there is ten years worth of data, you may want to only query one year's 
> worth.  Here each token range would represent one 'token' but all events for 
> the day. 
> {code:java}
> Set accounts = getRelevantAccounts();
> Set dateRange = generateDateRange("2018-01-01", "2019-01-01");
> PCollection tokens = generateTokens(accounts, dateRange); 
> {code}
>  
>  I propose an additional _readAll()_ PTransform that can take a PCollection 
> of token ranges and can return a PCollection of what the query would 
> return. 
> *Question: How much code should be in common between both methods?* 
> Currently the read connector already groups all partitions into a List of 
> Token Ranges, so it would be simple to refactor the current read() based 
> method to a 'ParDo' based one and have them both share the same function.  
> Reasons against sharing code between read and readAll
>  * Not having the read based method return a BoundedSource connector would 
> mean losing the ability to know the size of the data returned
>  * Currently the CassandraReader executes all the grouped TokenRange queries 
> *asynchronously* which is (maybe?) fine when all that's happening is 
> splitting up all the partition ranges but terrible for executing potentially 
> millions of queries. 
>  Reasons _for_ sharing code would be simplified code base and that both of 
> the above issues would most likely have a negligable performance impact. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10195) [beam_PostCommit_Java_ValidatesRunner_Samza] [ org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault]

2020-06-04 Thread Rui Wang (Jira)
Rui Wang created BEAM-10195:
---

 Summary: [beam_PostCommit_Java_ValidatesRunner_Samza] [ 
org.apache.beam.sdk.transforms.ParDoTest$TimerTests.testOutputTimestampDefault]
 Key: BEAM-10195
 URL: https://issues.apache.org/jira/browse/BEAM-10195
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Rui Wang


https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/6698/testReport/


Initial investigation:
TimerTests is not fully supported by Samza runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10087) Flink 1.10 failing [Java 11]

2020-06-04 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126167#comment-17126167
 ] 

Kenneth Knowles commented on BEAM-10087:


Because we have Java11-incompatible deps, the idea is that a user would use 
Java 11 for their pipeline, but then it could have a dependency on Beam built 
with Java 8 still.

> Flink 1.10 failing [Java 11]
> 
>
> Key: BEAM-10087
> URL: https://issues.apache.org/jira/browse/BEAM-10087
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Pawel Pasterz
>Priority: P2
>
> Gradle task _*:runners:flink:1.10:generatePipelineOptionsTableJava*_ fails 
> during Java 11 Precommit job 
> Example stack trace
> {code:java}
> > Task :runners:flink:1.10:generatePipelineOptionsTableJava FAILED
> ...
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.UnsupportedClassVersionError: 
> org/apache/beam/runners/flink/website/PipelineOptionsTableGenerator has been 
> compiled by a more recent version of the Java Runtime (class file version 
> 55.0), this version of the Java Runtime only recognizes class file versions 
> up to 52.0
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> :runners:flink:1.10:generatePipelineOptionsTableJava (Thread[Execution worker 
> for ':' Thread 11,5,main]) completed. Took 1.744 secs.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8543) Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8543?focusedWorklogId=441501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441501
 ]

ASF GitHub Bot logged work on BEAM-8543:


Author: ASF GitHub Bot
Created on: 04/Jun/20 20:08
Start Date: 04/Jun/20 20:08
Worklog Time Spent: 10m 
  Work Description: rehmanmuradali commented on pull request #11924:
URL: https://github.com/apache/beam/pull/11924#issuecomment-639088647


   R: @reuvenlax  
   Could you please take a look that I am on right track?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441501)
Time Spent: 20m  (was: 10m)

> Dataflow streaming timers are not strictly time ordered when set earlier 
> mid-bundle
> ---
>
> Key: BEAM-8543
> URL: https://issues.apache.org/jira/browse/BEAM-8543
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Rehman Murad Ali
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8543) Dataflow streaming timers are not strictly time ordered when set earlier mid-bundle

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8543?focusedWorklogId=441499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441499
 ]

ASF GitHub Bot logged work on BEAM-8543:


Author: ASF GitHub Bot
Created on: 04/Jun/20 19:59
Start Date: 04/Jun/20 19:59
Worklog Time Spent: 10m 
  Work Description: rehmanmuradali opened a new pull request #11924:
URL: https://github.com/apache/beam/pull/11924


   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-9710) Got current time instead of timestamp value

2020-06-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9710:
--
Labels: zetasql-compliance  (was: stale-assigned zetasql-compliance)

> Got current time instead of timestamp value
> ---
>
> Key: BEAM-9710
> URL: https://issues.apache.org/jira/browse/BEAM-9710
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Robin Qiu
>Priority: P4
>  Labels: zetasql-compliance
>
> one failure in shard 13
> {code}
> Expected: ARRAY>[{2014-12-01 00:00:00+00}]
>   Actual: ARRAY>[{2020-04-06 
> 00:20:40.052+00}], 
> {code}
> {code}
> [prepare_database]
> CREATE TABLE Table1 AS
> SELECT timestamp '2014-12-01' as timestamp_val
> --
> ARRAY>[{2014-12-01 00:00:00+00}]
> ==
> [name=timestamp_type_2]
> SELECT timestamp_val
> FROM Table1
> --
> ARRAY>[{2014-12-01 00:00:00+00}]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9709) timezone off by 8 hours

2020-06-04 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9709:
--
Labels: zetasql-compliance  (was: stale-assigned zetasql-compliance)

> timezone off by 8 hours
> ---
>
> Key: BEAM-9709
> URL: https://issues.apache.org/jira/browse/BEAM-9709
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Robin Qiu
>Priority: P4
>  Labels: zetasql-compliance
>
> two failures in shard 13, one failure in shard 19
> {code}
> Expected: ARRAY>[{2014-01-31 00:00:00+00}]
>   Actual: ARRAY>[{2014-01-31 08:00:00+00}], 
> {code}
> {code}
> select timestamp(date '2014-01-31')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10188) Automate Github release

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441488
 ]

ASF GitHub Bot logged work on BEAM-10188:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 19:20
Start Date: 04/Jun/20 19:20
Worklog Time Spent: 10m 
  Work Description: jphalip commented on pull request #11918:
URL: https://github.com/apache/beam/pull/11918#issuecomment-639064988


   Ok, I've loaded `script.config`. I've also added a confirmation message 
before submitting the API request.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441488)
Time Spent: 0.5h  (was: 20m)

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10188) Automate Github release

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441486
 ]

ASF GitHub Bot logged work on BEAM-10188:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 19:13
Start Date: 04/Jun/20 19:13
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11918:
URL: https://github.com/apache/beam/pull/11918#issuecomment-639062174


   > All the variables used should already be set in script.config. Does that 
work?
   
   That's probably fine, but if that's the expectation we need to `source 
script.config`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441486)
Time Spent: 20m  (was: 10m)

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441484
 ]

ASF GitHub Bot logged work on BEAM-10175:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 19:07
Start Date: 04/Jun/20 19:07
Worklog Time Spent: 10m 
  Work Description: pabloem merged pull request #11893:
URL: https://github.com/apache/beam/pull/11893


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441484)
Time Spent: 2h 20m  (was: 2h 10m)

> FhirIO execute bundle uses deprecated auth uri param
> 
>
> Key: BEAM-10175
> URL: https://issues.apache.org/jira/browse/BEAM-10175
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.22.0
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P2
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441483
 ]

ASF GitHub Bot logged work on BEAM-10175:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 19:07
Start Date: 04/Jun/20 19:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11893:
URL: https://github.com/apache/beam/pull/11893#issuecomment-639059270


   Ah great observation! I'll merge. I am not aware of it being flaky otherwise.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441483)
Time Spent: 2h 10m  (was: 2h)

> FhirIO execute bundle uses deprecated auth uri param
> 
>
> Key: BEAM-10175
> URL: https://issues.apache.org/jira/browse/BEAM-10175
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.22.0
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P2
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10188) Automate Github release

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10188?focusedWorklogId=441482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441482
 ]

ASF GitHub Bot logged work on BEAM-10188:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 19:06
Start Date: 04/Jun/20 19:06
Worklog Time Spent: 10m 
  Work Description: jphalip commented on pull request #11918:
URL: https://github.com/apache/beam/pull/11918#issuecomment-639058954


   @ibzib @TheNeuralBit Thank you both for the feedback. I've moved the code to 
a separate script. All the variables used should already be set in 
`script.config`. Does that work?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441482)
Remaining Estimate: 0h
Time Spent: 10m

> Automate Github release
> ---
>
> Key: BEAM-10188
> URL: https://issues.apache.org/jira/browse/BEAM-10188
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Kyle Weaver
>Priority: P2
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we push the tag to Github and fill in the release notes in 
> separate steps. For feeds consuming these updates, it would be better to do 
> both in the same step using the Github API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441477
 ]

ASF GitHub Bot logged work on BEAM-10175:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 19:00
Start Date: 04/Jun/20 19:00
Worklog Time Spent: 10m 
  Work Description: jaketf edited a comment on pull request #11893:
URL: https://github.com/apache/beam/pull/11893#issuecomment-639055068


   Hmmm my latest commit was just `spotlessApply` and this entire PR's change 
(how we pass the auth token for HCAPI requests for FHIR) seems unrelated to the 
GCS object moving logic in FHIR import.
   
   It does seem like a flake. Note that only the R4 variant failed of a 
parameterized test but passed for other two FHIR verisons (DSTU2, STU3).
   The failure seems to be due to trying to move a gcs object that no longer 
exists (because another thread cleaned it up?) as evidenced by 404 on the 
source object in the stacktrace (this logic is independent of FHIR version).
   
   The relevant logic where this error is raised is in `Filesystems::copy` in 
[FhirIO.ImportFn::importBatch](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1046).

   @pabloem  Have we seen other instances of this test being flaky? [This 
dashboard](https://builds.apache.org/job/beam_PostCommit_Java_PR/394/testReport/junit/org.apache.beam.sdk.io.gcp.healthcare/FhirIOWriteIT/testFhirIO_Import_R4_/history/)
 implies this was the only failure but it is not a long history.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441477)
Time Spent: 2h  (was: 1h 50m)

> FhirIO execute bundle uses deprecated auth uri param
> 
>
> Key: BEAM-10175
> URL: https://issues.apache.org/jira/browse/BEAM-10175
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.22.0
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P2
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10175) FhirIO execute bundle uses deprecated auth uri param

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10175?focusedWorklogId=441476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441476
 ]

ASF GitHub Bot logged work on BEAM-10175:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 18:58
Start Date: 04/Jun/20 18:58
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11893:
URL: https://github.com/apache/beam/pull/11893#issuecomment-639055068


   Hmmm my latest commit was just `spotlessApply` and this entire PR's change 
(how we pass the auth token for HCAPI requests for FHIR) seems unrelated to the 
GCS object moving logic in FHIR import.
   
   It does seem like a flake. Note that only the R4 variant failed of a 
parameterized test but passed for other two FHIR verisons (DSTU2, STU3).
   The failure seems to be due to trying to move a gcs object that no longer 
exists (because another thread cleaned it up?) as evidenced by 404 on the 
source object in the stacktrace (this logic is independent of FHIR version).
   
   The relevant logic where this error is raised is in `Filesystems::copy` in 
[FhirIO.ImportFn::importBatch](https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.java#L1046).

   @pabloem  Have we seen other instances of this test being flaky?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441476)
Time Spent: 1h 50m  (was: 1h 40m)

> FhirIO execute bundle uses deprecated auth uri param
> 
>
> Key: BEAM-10175
> URL: https://issues.apache.org/jira/browse/BEAM-10175
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.22.0
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: P2
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=441474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441474
 ]

ASF GitHub Bot logged work on BEAM-10176:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 18:47
Start Date: 04/Jun/20 18:47
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #11923:
URL: https://github.com/apache/beam/pull/11923#issuecomment-639047436


   R: @pabloem 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441474)
Remaining Estimate: 2h 40m  (was: 2h 50m)
Time Spent: 20m  (was: 10m)

> Support BigQuery type aliases in WriteToBigQuery with Avro format
> -
>
> Key: BEAM-10176
> URL: https://issues.apache.org/jira/browse/BEAM-10176
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.23.0
>
>   Original Estimate: 3h
>  Time Spent: 20m
>  Remaining Estimate: 2h 40m
>
> Support BigQuery type aliases in WriteToBigQuery with avro temp file format. 
> The following aliases are missing:
> * {{STRUCT}} ({{==RECORD}})
> * {{FLOAT64}} ({{==FLOAT}})
> * {{INT64}} ({{==INTEGER}})
> Reference:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/TableFieldSchema.html#setType-java.lang.String-



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10176) Support BigQuery type aliases in WriteToBigQuery with Avro format

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10176?focusedWorklogId=441473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441473
 ]

ASF GitHub Bot logged work on BEAM-10176:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 18:46
Start Date: 04/Jun/20 18:46
Worklog Time Spent: 10m 
  Work Description: chunyang opened a new pull request #11923:
URL: https://github.com/apache/beam/pull/11923


   Support STRUCT, FLOAT64, INT64 BigQuery types in BIgQuery to Avro schema 
conversion.
   
   We already support RECORD, FLOAT, and INTEGER, which are aliases of the 
three types being added.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] ~Update `CHANGES.md` with noteworthy changes.~
- [ ] ~If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).~
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441468
 ]

ASF GitHub Bot logged work on BEAM-8693:


Author: ASF GitHub Bot
Created on: 04/Jun/20 18:30
Start Date: 04/Jun/20 18:30
Worklog Time Spent: 10m 
  Work Description: apilloud merged pull request #11836:
URL: https://github.com/apache/beam/pull/11836


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441468)
Time Spent: 5h  (was: 4h 50m)

> Beam Dependency Update Request: 
> com.google.cloud.datastore:datastore-v1-proto-client
> 
>
> Key: BEAM-8693
> URL: https://issues.apache.org/jira/browse/BEAM-8693
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:56.526732 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:51.468284 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:11:37.877225 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:10:45.889899 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441465
 ]

ASF GitHub Bot logged work on BEAM-8693:


Author: ASF GitHub Bot
Created on: 04/Jun/20 18:21
Start Date: 04/Jun/20 18:21
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11836:
URL: https://github.com/apache/beam/pull/11836#issuecomment-639023595


   I get `No new linkage errors`. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441465)
Time Spent: 4h 50m  (was: 4h 40m)

> Beam Dependency Update Request: 
> com.google.cloud.datastore:datastore-v1-proto-client
> 
>
> Key: BEAM-8693
> URL: https://issues.apache.org/jira/browse/BEAM-8693
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:56.526732 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:51.468284 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:11:37.877225 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:10:45.889899 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441463
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 04/Jun/20 18:15
Start Date: 04/Jun/20 18:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11922:
URL: https://github.com/apache/beam/pull/11922#issuecomment-639020928


   R: @boyuanzz @chamikaramj 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441463)
Time Spent: 36h 10m  (was: 36h)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: P2
>  Labels: portability, stale-assigned
>  Time Spent: 36h 10m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=441461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441461
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 04/Jun/20 18:15
Start Date: 04/Jun/20 18:15
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11922:
URL: https://github.com/apache/beam/pull/11922


   
   This fixes a bug where we would output within all the windows instead of 
just the current window.
   This would not impact any SDF that used only a single window while 
processing.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=441457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441457
 ]

ASF GitHub Bot logged work on BEAM-10036:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 18:09
Start Date: 04/Jun/20 18:09
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11766:
URL: https://github.com/apache/beam/pull/11766#discussion_r435426414



##
File path: sdks/python/apache_beam/dataframe/partitionings.py
##
@@ -0,0 +1,136 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+
+from typing import Any
+from typing import Iterable
+from typing import TypeVar
+
+import pandas as pd
+
+Frame = TypeVar('Frame', bound=pd.core.generic.NDFrame)
+
+
+class Partitioning(object):
+  """A class representing a (consistent) partitioning of dataframe objects.
+  """
+  def is_subpartitioning_of(self, other):
+# type: (Partitioning) -> bool
+
+"""Returns whether self is a sub-partition of other.
+
+Specifically, returns whether something partitioned by self is necissarily
+also partitioned by other.
+"""
+raise NotImplementedError
+
+  def partition_fn(self, df):
+# type: (Frame) -> Iterable[Tuple[Any, Frame]]
+
+"""A callable that actually performs the partitioning of a Frame df.
+
+This will be invoked via a FlatMap in conjunction with a GroupKey to
+achieve the desired partitioning.
+"""
+raise NotImplementedError
+
+
+class Index(Partitioning):
+  """A partitioning by index (either fully or partially).
+
+  If the set of "levels" of the index to consider is not specified, the entire
+  index is used.
+
+  These form a partial order, given by
+
+  Nothing() < Index([i]) < Index([i, j]) < ... < Index() < Singleton()
+
+  The ordering is implemented via the is_subpartitioning_of method, where the
+  examples on the right are subpartitionings of the examples on the left above.
+  """
+
+  _INDEX_PARTITIONS = 10
+
+  def __init__(self, levels=None):
+self._levels = levels
+
+  def __eq__(self, other):
+return type(self) == type(other) and self._levels == other._levels
+
+  def __hash__(self):
+if self._levels:
+  return hash(tuple(sorted(self._levels)))
+else:
+  return hash(type(self))
+
+  def is_subpartitioning_of(self, other):
+if isinstance(other, Nothing):
+  return True
+elif isinstance(other, Index):
+  if self._levels is None:
+return True
+  elif other._levels is None:
+return False
+  else:
+return all(level in other._levels for level in self._levels)
+else:
+  return False
+
+  def partition_fn(self, df):
+if self._levels is None:
+  levels = list(range(df.index.nlevels))
+else:
+  levels = self._levels
+hashes = sum(
+pd.util.hash_array(df.index.get_level_values(level))
+for level in levels)
+for key in range(self._INDEX_PARTITIONS):
+  yield key, df[hashes % self._INDEX_PARTITIONS == key]
+
+
+class Singleton(Partitioning):
+  """A partitioning co-locating all data to a singleton partition.
+  """
+  def __eq__(self, other):
+return type(self) == type(other)
+
+  def __hash__(self):
+return hash(type(self))
+
+  def is_subpartitioning_of(self, other):
+return True
+
+  def partition_fn(self, df):
+yield None, df
+
+
+class Nothing(Partitioning):
+  """A partitioning imposing no constraints on the actual partitioning.
+  """
+  def __eq__(self, other):
+return type(self) == type(other)
+
+  def __hash__(self):
+return hash(type(self))
+
+  def is_subpartitioning_of(self, other):
+return not other
+
+  def __bool__(self):
+return False
+
+  __nonzero__ = __bool__

Review comment:
   I think that making Nothing falsy and relying on that in logic elsewhere 
harms readability. What do you think about dropping this and just explicitly 
checking for Nothing when needed?

##
File path: sdks/python/apache_beam/dataframe/transforms.py
##
@@ -138,36 +140,35 @@ def evaluate(partition, 

[jira] [Resolved] (BEAM-8153) PubSubIntegrationTest failing in post-commit

2020-06-04 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-8153.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> PubSubIntegrationTest failing in post-commit
> 
>
> Key: BEAM-8153
> URL: https://issues.apache.org/jira/browse/BEAM-8153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Matthew Darwin
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
>
> Most likely due to: https://github.com/apache/beam/pull/9232
> {code}
> 11:44:31 
> ==
> 11:44:31 ERROR: test_streaming_with_attributes 
> (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> 11:44:31 
> --
> 11:44:31 Traceback (most recent call last):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 199, in test_streaming_with_attributes
> 11:44:31 self._test_streaming(with_attributes=True)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 191, in _test_streaming
> 11:44:31 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>  line 91, in run_pipeline
> 11:44:31 result = p.run()
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 420, in run
> 11:44:31 return self.runner.run_pipeline(self, self._options)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 11:44:31 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> 11:44:31 _assert_match(actual=arg1, matcher=arg2, reason=arg3)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> 11:44:31 if not matcher.matches(actual):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/core/allof.py",
>  line 16, in matches
> 11:44:31 if not matcher.matches(item):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> 11:44:31 match_result = self._matches(item)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py",
>  line 91, in _matches
> 11:44:31 return Counter(self.messages) == Counter(self.expected_msg)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 566, in __init__
> 11:44:31 self.update(*args, **kwds)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 653, in update
> 11:44:31 _count_elements(self, iterable)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub.py",
>  line 83, in __hash__
> 11:44:31 self.message_id, self.publish_time.seconds,
> 11:44:31 AttributeError: 'NoneType' object has no attribute 'seconds'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10193) Update Jenkins VMs with docker-credential-gcloud

2020-06-04 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126125#comment-17126125
 ] 

Udi Meiri commented on BEAM-10193:
--

cc: [~tysonjh]

> Update Jenkins VMs with docker-credential-gcloud
> 
>
> Key: BEAM-10193
> URL: https://issues.apache.org/jira/browse/BEAM-10193
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Brian Hulette
>Priority: P3
>
> See BEAM-7405 (test failure currently resolved with an inelegant workaround) 
> for motivation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441455
 ]

ASF GitHub Bot logged work on BEAM-8693:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:55
Start Date: 04/Jun/20 17:55
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #11836:
URL: https://github.com/apache/beam/pull/11836#issuecomment-639010785


   Does `sdks/java/build-tools/beam-linkage-check.sh` 
([wiki](https://cwiki.apache.org/confluence/display/BEAM/Dependency+Upgrades)), 
without arguments, complain anything now?
   If no, I'm good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441455)
Time Spent: 4h 40m  (was: 4.5h)

> Beam Dependency Update Request: 
> com.google.cloud.datastore:datastore-v1-proto-client
> 
>
> Key: BEAM-8693
> URL: https://issues.apache.org/jira/browse/BEAM-8693
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:56.526732 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:51.468284 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:11:37.877225 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:10:45.889899 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10190) Reduce cost of toString of MetricKey and MetricName

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10190?focusedWorklogId=441452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441452
 ]

ASF GitHub Bot logged work on BEAM-10190:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 17:52
Start Date: 04/Jun/20 17:52
Worklog Time Spent: 10m 
  Work Description: Zhangyx39 commented on pull request #11915:
URL: https://github.com/apache/beam/pull/11915#issuecomment-639009096


   Thank you, Likasz and Xinyu.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441452)
Time Spent: 1h 10m  (was: 1h)

> Reduce cost of toString of MetricKey and MetricName
> ---
>
> Key: BEAM-10190
> URL: https://issues.apache.org/jira/browse/BEAM-10190
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Yixing Zhang
>Assignee: Yixing Zhang
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Samza runner heavily uses MetricKey.toString() and MetricName.toString() to 
> update Samza metrics. We found that the toString methods have high CPU cost. 
> And according to this article: 
> [https://redfin.engineering/java-string-concatenation-which-way-is-best-8f590a7d22a8],
>  we should use "+" operator instead of String.format for string concatenation 
> for better performance.
> We do see a 10% QPS gain in nexmark queries using Samza runner with the 
> change of using "+" operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8693) Beam Dependency Update Request: com.google.cloud.datastore:datastore-v1-proto-client

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8693?focusedWorklogId=441450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441450
 ]

ASF GitHub Bot logged work on BEAM-8693:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:49
Start Date: 04/Jun/20 17:49
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #11836:
URL: https://github.com/apache/beam/pull/11836#issuecomment-639007691


   R: @suztomo 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441450)
Time Spent: 4.5h  (was: 4h 20m)

> Beam Dependency Update Request: 
> com.google.cloud.datastore:datastore-v1-proto-client
> 
>
> Key: BEAM-8693
> URL: https://issues.apache.org/jira/browse/BEAM-8693
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: P2
> Fix For: Not applicable
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:56.526732 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:51.468284 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:11:37.877225 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:10:45.889899 
> -
> Please consider upgrading the dependency 
> com.google.cloud.datastore:datastore-v1-proto-client. 
> The current version is 1.6.0. The latest version is 1.6.3 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10190) Reduce cost of toString of MetricKey and MetricName

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10190?focusedWorklogId=441449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441449
 ]

ASF GitHub Bot logged work on BEAM-10190:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 17:47
Start Date: 04/Jun/20 17:47
Worklog Time Spent: 10m 
  Work Description: xinyuiscool commented on pull request #11915:
URL: https://github.com/apache/beam/pull/11915#issuecomment-639006539


   @lukecwik : thanks for merging it!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441449)
Time Spent: 1h  (was: 50m)

> Reduce cost of toString of MetricKey and MetricName
> ---
>
> Key: BEAM-10190
> URL: https://issues.apache.org/jira/browse/BEAM-10190
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Yixing Zhang
>Assignee: Yixing Zhang
>Priority: P2
> Fix For: 2.23.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Samza runner heavily uses MetricKey.toString() and MetricName.toString() to 
> update Samza metrics. We found that the toString methods have high CPU cost. 
> And according to this article: 
> [https://redfin.engineering/java-string-concatenation-which-way-is-best-8f590a7d22a8],
>  we should use "+" operator instead of String.format for string concatenation 
> for better performance.
> We do see a 10% QPS gain in nexmark queries using Samza runner with the 
> change of using "+" operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=441428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441428
 ]

ASF GitHub Bot logged work on BEAM-9679:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:21
Start Date: 04/Jun/20 17:21
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on pull request #11883:
URL: https://github.com/apache/beam/pull/11883#issuecomment-638992758


   I've incorporated all the helpful comments.  I'll wait for Henry's final 
approval before updating stepik/committing `*-remote.yaml` files.  Thank you, 
both.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441428)
Time Spent: 8h 50m  (was: 8h 40m)

> Core Transforms | Go SDK Code Katas
> ---
>
> Key: BEAM-9679
> URL: https://issues.apache.org/jira/browse/BEAM-9679
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: P2
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> A kata devoted to core beam transforms patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms]
>  where the take away is an individual's ability to master the following using 
> an Apache Beam pipeline using the Golang SDK.
>  
> ||Transform||Pull Request||Status||
> |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed|
> |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed|
> |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Closed|
> |Combine Simple 
> Function|[11866|https://github.com/apache/beam/pull/11866]|Closed|
> |CombineFn|[11883|https://github.com/apache/beam/pull/11883]|Open|
> |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed|
> |Partition| | |
> |Side Input| | |
> |Side Output| | |
> |Branching| | |
> |Composite Transform| | |
> |DoFn Additional Parameters| | |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10036) More flexible dataframes partitioning.

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10036?focusedWorklogId=441429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441429
 ]

ASF GitHub Bot logged work on BEAM-10036:
-

Author: ASF GitHub Bot
Created on: 04/Jun/20 17:21
Start Date: 04/Jun/20 17:21
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on a change in pull request 
#11766:
URL: https://github.com/apache/beam/pull/11766#discussion_r435422511



##
File path: sdks/python/apache_beam/dataframe/expressions.py
##
@@ -85,16 +87,10 @@ def evaluate_at(self, session):  # type: (Session) -> T
 """Returns the result of self with the bindings given in session."""
 raise NotImplementedError(type(self))
 
-  def requires_partition_by_index(self):  # type: () -> bool
-"""Whether this expression requires its argument(s) to be partitioned
-by index."""
-# TODO: It might be necessary to support partitioning by part of the index,
-# for some args, which would require returning more than a boolean here.
+  def requires_partition_by(self):  # type: () -> Partitioning
 raise NotImplementedError(type(self))
 
-  def preserves_partition_by_index(self):  # type: () -> bool
-"""Whether the result of this expression will be partitioned by index
-whenever all of its inputs are partitioned by index."""
+  def preserves_partition_by(self):  # type: () -> Partitioning

Review comment:
   Ah that makes sense. And I guess the name "preserves" is actually 
intuitive now that I understand it's setting an upper bound on the output 
partitioning.
   
   I think the complexity is worth it, unless there's a chance those operations 
will never materialize. Can you just add a docstring indicating that 
"preserves" sets an upper bound on the output partitioning (or any other 
language to make sure readers can grok it)? A similar comment about requires 
would be good too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441429)
Time Spent: 1h 20m  (was: 1h 10m)

> More flexible dataframes partitioning.
> --
>
> Key: BEAM-10036
> URL: https://issues.apache.org/jira/browse/BEAM-10036
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: P2
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently we only track a boolean of whether a dataframe is partitioned by 
> the (full) index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=441427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441427
 ]

ASF GitHub Bot logged work on BEAM-9869:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:20
Start Date: 04/Jun/20 17:20
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-638992107







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441427)
Time Spent: 3h 10m  (was: 3h)

> adding self-contained Kafka service jar for testing
> ---
>
> Key: BEAM-9869
> URL: https://issues.apache.org/jira/browse/BEAM-9869
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: P2
>  Labels: stale-assigned
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> adding self-contained Kafka service jar for testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441418
 ]

ASF GitHub Bot logged work on BEAM-9742:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:04
Start Date: 04/Jun/20 17:04
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11396:
URL: https://github.com/apache/beam/pull/11396#issuecomment-638984333


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441418)
Time Spent: 4h 20m  (was: 4h 10m)

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441419
 ]

ASF GitHub Bot logged work on BEAM-9742:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:04
Start Date: 04/Jun/20 17:04
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev removed a comment on pull request #11396:
URL: https://github.com/apache/beam/pull/11396#issuecomment-638984073


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441419)
Time Spent: 4.5h  (was: 4h 20m)

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441417
 ]

ASF GitHub Bot logged work on BEAM-9742:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:04
Start Date: 04/Jun/20 17:04
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev removed a comment on pull request #11396:
URL: https://github.com/apache/beam/pull/11396#issuecomment-638983908


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441417)
Time Spent: 4h 10m  (was: 4h)

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9742) Add ability to pass FluentBackoff to JdbcIo.Write

2020-06-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9742?focusedWorklogId=441416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441416
 ]

ASF GitHub Bot logged work on BEAM-9742:


Author: ASF GitHub Bot
Created on: 04/Jun/20 17:04
Start Date: 04/Jun/20 17:04
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11396:
URL: https://github.com/apache/beam/pull/11396#issuecomment-638984073


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 441416)
Time Spent: 4h  (was: 3h 50m)

> Add ability to pass FluentBackoff to JdbcIo.Write
> -
>
> Key: BEAM-9742
> URL: https://issues.apache.org/jira/browse/BEAM-9742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Akshay Iyangar
>Assignee: Akshay Iyangar
>Priority: P3
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, the FluentBackoff is hardcoded with `maxRetries` and 
> `initialBackoff` .
> It would be helpful if the client were able to pass these values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >