[jira] [Commented] (BEAM-7019) Reify transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887640#comment-16887640 ] Shehzaad Nakhoda commented on BEAM-7019: [~reuvenlax][~altay] BEAM-7388 was filed and has been resolved already. > Reify transform for Python SDK > -- > > Key: BEAM-7019 > URL: https://issues.apache.org/jira/browse/BEAM-7019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rose Nguyen >Assignee: Shehzaad Nakhoda >Priority: Minor > Fix For: 2.14.0 > > > PTransforms for converting between explicit and implicit form of various Beam > values. > It should offer the same API as its Java counterpart: > [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7019) Reify transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shehzaad Nakhoda resolved BEAM-7019. Resolution: Duplicate Fix Version/s: 2.14.0 > Reify transform for Python SDK > -- > > Key: BEAM-7019 > URL: https://issues.apache.org/jira/browse/BEAM-7019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rose Nguyen >Assignee: Shehzaad Nakhoda >Priority: Minor > Fix For: 2.14.0 > > > PTransforms for converting between explicit and implicit form of various Beam > values. > It should offer the same API as its Java counterpart: > [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7767) Regexp matching breaks on Windows for fileio test
[ https://issues.apache.org/jira/browse/BEAM-7767?focusedWorklogId=278715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278715 ] ASF GitHub Bot logged work on BEAM-7767: Author: ASF GitHub Bot Created on: 18/Jul/19 05:17 Start Date: 18/Jul/19 05:17 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9097: [BEAM-7767] Improving regexp matching for fileio test URL: https://github.com/apache/beam/pull/9097#issuecomment-512668419 Run Python_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278715) Time Spent: 20m (was: 10m) > Regexp matching breaks on Windows for fileio test > - > > Key: BEAM-7767 > URL: https://issues.apache.org/jira/browse/BEAM-7767 > Project: Beam > Issue Type: Improvement > Components: io-python-files >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7767) Regexp matching breaks on Windows for fileio test
[ https://issues.apache.org/jira/browse/BEAM-7767?focusedWorklogId=278712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278712 ] ASF GitHub Bot logged work on BEAM-7767: Author: ASF GitHub Bot Created on: 18/Jul/19 05:10 Start Date: 18/Jul/19 05:10 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9097: [BEAM-7767] Improving regexp matching for fileio test URL: https://github.com/apache/beam/pull/9097 This matching runs into problems when receiving a `c:/...` filepath, so I'm just matching on the file name, and wildcarding the directory. r: @chamikaramj Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
[jira] [Created] (BEAM-7767) Regexp matching breaks on Windows for fileio test
Pablo Estrada created BEAM-7767: --- Summary: Regexp matching breaks on Windows for fileio test Key: BEAM-7767 URL: https://issues.apache.org/jira/browse/BEAM-7767 Project: Beam Issue Type: Improvement Components: io-python-files Reporter: Pablo Estrada Assignee: Pablo Estrada -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278700 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 18/Jul/19 04:52 Start Date: 18/Jul/19 04:52 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278700) Time Spent: 1h 40m (was: 1.5h) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278698 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 18/Jul/19 04:52 Start Date: 18/Jul/19 04:52 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064#issuecomment-512664025 I've had a large number of builds on Jenkins and locally fail due to maven central download issues. Here is a scan of `:javaPreCommit` https://gradle.com/s/p4mabdazm6yjq This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278698) Time Spent: 1.5h (was: 1h 20m) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278682 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 18/Jul/19 03:18 Start Date: 18/Jul/19 03:18 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064#issuecomment-512648841 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278682) Time Spent: 1h 20m (was: 1h 10m) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7714) Allow retries of PostCommit test suites per Python version
[ https://issues.apache.org/jira/browse/BEAM-7714?focusedWorklogId=278680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278680 ] ASF GitHub Bot logged work on BEAM-7714: Author: ASF GitHub Bot Created on: 18/Jul/19 03:10 Start Date: 18/Jul/19 03:10 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [BEAM-7714] [BEAM-7257] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512647646 R: @udim This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278680) Time Spent: 0.5h (was: 20m) > Allow retries of PostCommit test suites per Python version > -- > > Key: BEAM-7714 > URL: https://issues.apache.org/jira/browse/BEAM-7714 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Mark Liu >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently Python PostCommit test executes 4 tests running the set of tests > under Python 2.7, 3.5-3.7. When test execution fails due to a flake, > contributors have to rerun the whole suite. Having a possibility to re-run > test suite only for a particular version of Python would make it easier to > receive a green run. > Some considerations: > - increasing number of Jenkins job will increase the number of slots > required by postcommit, this will slow down the queue, unless we increase > number of slots. We can investigate utilization of Jenkins workers to see if > slot increase is advisable. > - we could introduce phrase-only suites "Run Python 3.7 PostCommits", that > will be separate jenkins jobs (1 suite, 1 slot) in addition to current jobs. > phrase-only suites will not be triggered on the PR but will be triggered > manually when users want to re-run tests for particular version. It may cause > confusion on a PR though, since PR author will have to explain to reviewers > that Python 3 Postcommit suite failed, but only 3.6 portion failed, and I > re-ran Py3.6 portion only in this separate jenkins Job and it passed, so PR > is safe to merge. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7714) Allow retries of PostCommit test suites per Python version
[ https://issues.apache.org/jira/browse/BEAM-7714?focusedWorklogId=278679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278679 ] ASF GitHub Bot logged work on BEAM-7714: Author: ASF GitHub Bot Created on: 18/Jul/19 03:08 Start Date: 18/Jul/19 03:08 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7714] [BEAM-7257] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512647182 Run Go PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278679) Time Spent: 20m (was: 10m) > Allow retries of PostCommit test suites per Python version > -- > > Key: BEAM-7714 > URL: https://issues.apache.org/jira/browse/BEAM-7714 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Mark Liu >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > Currently Python PostCommit test executes 4 tests running the set of tests > under Python 2.7, 3.5-3.7. When test execution fails due to a flake, > contributors have to rerun the whole suite. Having a possibility to re-run > test suite only for a particular version of Python would make it easier to > receive a green run. > Some considerations: > - increasing number of Jenkins job will increase the number of slots > required by postcommit, this will slow down the queue, unless we increase > number of slots. We can investigate utilization of Jenkins workers to see if > slot increase is advisable. > - we could introduce phrase-only suites "Run Python 3.7 PostCommits", that > will be separate jenkins jobs (1 suite, 1 slot) in addition to current jobs. > phrase-only suites will not be triggered on the PR but will be triggered > manually when users want to re-run tests for particular version. It may cause > confusion on a PR though, since PR author will have to explain to reviewers > that Python 3 Postcommit suite failed, but only 3.6 portion failed, and I > re-ran Py3.6 portion only in this separate jenkins Job and it passed, so PR > is safe to merge. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7714) Allow retries of PostCommit test suites per Python version
[ https://issues.apache.org/jira/browse/BEAM-7714?focusedWorklogId=278678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278678 ] ASF GitHub Bot logged work on BEAM-7714: Author: ASF GitHub Bot Created on: 18/Jul/19 03:08 Start Date: 18/Jul/19 03:08 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7714] [BEAM-7257] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512647130 Run Python_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278678) Time Spent: 10m Remaining Estimate: 0h > Allow retries of PostCommit test suites per Python version > -- > > Key: BEAM-7714 > URL: https://issues.apache.org/jira/browse/BEAM-7714 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Mark Liu >Priority: Blocker > Time Spent: 10m > Remaining Estimate: 0h > > Currently Python PostCommit test executes 4 tests running the set of tests > under Python 2.7, 3.5-3.7. When test execution fails due to a flake, > contributors have to rerun the whole suite. Having a possibility to re-run > test suite only for a particular version of Python would make it easier to > receive a green run. > Some considerations: > - increasing number of Jenkins job will increase the number of slots > required by postcommit, this will slow down the queue, unless we increase > number of slots. We can investigate utilization of Jenkins workers to see if > slot increase is advisable. > - we could introduce phrase-only suites "Run Python 3.7 PostCommits", that > will be separate jenkins jobs (1 suite, 1 slot) in addition to current jobs. > phrase-only suites will not be triggered on the PR but will be triggered > manually when users want to re-run tests for particular version. It may cause > confusion on a PR though, since PR author will have to explain to reviewers > that Python 3 Postcommit suite failed, but only 3.6 portion failed, and I > re-ran Py3.6 portion only in this separate jenkins Job and it passed, so PR > is safe to merge. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887565#comment-16887565 ] Kenneth Knowles commented on BEAM-7766: --- Is UNKNOWN a value that could be returned by the service? Or is it only a client side indication that it does not understand it? These two cases should be kept separate. If I recall from the Java SDK, these two are actually both possible and different. > Dataflow runner should default to PiplelineState.UNKNOWN when job state > received via v1beta3 cannot be recognized. > -- > > Key: BEAM-7766 > URL: https://issues.apache.org/jira/browse/BEAM-7766 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.1.0 >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Minor > Fix For: 2.7.1, 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887563#comment-16887563 ] Kenneth Knowles commented on BEAM-7766: --- Thank you for filing this. Please remember to not close this until it is cherry-picked to 2.7.1. Or else you can clone it and close this one when it reaches master and close the clone when it is merged to 2.7.1. > Dataflow runner should default to PiplelineState.UNKNOWN when job state > received via v1beta3 cannot be recognized. > -- > > Key: BEAM-7766 > URL: https://issues.apache.org/jira/browse/BEAM-7766 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.1.0 >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Minor > Fix For: 2.7.1, 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278676 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:58 Start Date: 18/Jul/19 02:58 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512645464 Run Python 3.6 Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278676) Time Spent: 3.5h (was: 3h 20m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278677 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:58 Start Date: 18/Jul/19 02:58 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512645505 Run PYthon 3.7 Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278677) Time Spent: 3h 40m (was: 3.5h) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278674 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:58 Start Date: 18/Jul/19 02:58 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512645359 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278674) Time Spent: 3h 10m (was: 3h) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278675 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:58 Start Date: 18/Jul/19 02:58 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512645414 Run Python 3.5 Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278675) Time Spent: 3h 20m (was: 3h 10m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=278673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278673 ] ASF GitHub Bot logged work on BEAM-7600: Author: ASF GitHub Bot Created on: 18/Jul/19 02:57 Start Date: 18/Jul/19 02:57 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9095: [BEAM-7600] borrow SDK harness management code into Spark runner URL: https://github.com/apache/beam/pull/9095 Now the Spark runner can reuse SDK harnesses, and multiple SDK harness can be used. The latter will hopefully enable multicore processing on TFX, for example. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) Pre-Commit Tests
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278671 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:47 Start Date: 18/Jul/19 02:47 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512643387 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278671) Time Spent: 3h (was: 2h 50m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 3h > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=278669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278669 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 18/Jul/19 02:36 Start Date: 18/Jul/19 02:36 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#discussion_r304671282 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -550,6 +562,25 @@ def verify(self): 'loaded into BigQuery. Please provide a GCS bucket, or ' 'pass method="STREAMING_INSERTS" to WriteToBigQuery.' % self._custom_gcs_temp_location.get()) +if self.is_streaming_pipeline and not self.triggering_frequency: + raise ValueError('triggering_frequency must be specified to use file' + 'loads in streaming') +elif not self.is_streaming_pipeline and self.triggering_frequency: + raise ValueError('triggering_frequency can only be used with file' + 'loads in streaming') + + def _window_fn(self): +if self.is_streaming_pipeline: + return beam.WindowInto(beam.window.GlobalWindows(), + trigger=trigger.Repeatedly( + trigger.AfterAny( + trigger.AfterProcessingTime( + self.triggering_frequency), + trigger.AfterCount( + _FILE_TRIGGERING_RECORD_COUNT))), Review comment: If we trigger after a certain number of records OR the triggering frequency, we may end up triggering more times than the quota supports, right? Supposing that the records are coming into the pipeline very quickly. Can you share your reasoning around this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278669) Time Spent: 4h 40m (was: 4.5h) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 4h 40m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=278668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278668 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 18/Jul/19 02:36 Start Date: 18/Jul/19 02:36 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#discussion_r304670918 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -550,6 +562,25 @@ def verify(self): 'loaded into BigQuery. Please provide a GCS bucket, or ' 'pass method="STREAMING_INSERTS" to WriteToBigQuery.' % self._custom_gcs_temp_location.get()) +if self.is_streaming_pipeline and not self.triggering_frequency: + raise ValueError('triggering_frequency must be specified to use file' + 'loads in streaming') +elif not self.is_streaming_pipeline and self.triggering_frequency: + raise ValueError('triggering_frequency can only be used with file' + 'loads in streaming') + + def _window_fn(self): +if self.is_streaming_pipeline: + return beam.WindowInto(beam.window.GlobalWindows(), + trigger=trigger.Repeatedly( + trigger.AfterAny( + trigger.AfterProcessingTime( + self.triggering_frequency), + trigger.AfterCount( + _FILE_TRIGGERING_RECORD_COUNT))), + accumulation_mode=trigger.AccumulationMode\ + .DISCARDING) +return beam.WindowInto(beam.window.GlobalWindows()) Review comment: Nit: Maybe add `else: \ return ...globalwindow...`? I find it (a tiny bit) easier to read. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278668) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 4.5h > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=278667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278667 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 18/Jul/19 02:36 Start Date: 18/Jul/19 02:36 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#discussion_r304713800 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -622,8 +653,9 @@ def expand(self, pcoll): test_client=self.test_client, temporary_tables=self.temp_tables, additional_bq_parameters=self.additional_bq_parameters), -load_job_name_pcv, *self.schema_side_inputs).with_outputs( -TriggerLoadJobs.TEMP_TABLES, main='main') +load_job_name_pcv, self.is_streaming_pipeline, Review comment: I wonder if we should pass `is_streaming_pipeline` at pipeline construction (i.e. in the constructor) rather than a side input. It would allow us to show it as .p.ex. display data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278667) Time Spent: 4.5h (was: 4h 20m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 4.5h > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278657 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:19 Start Date: 18/Jul/19 02:19 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512637641 run python 3.5 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278657) Time Spent: 2h 40m (was: 2.5h) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278658 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:19 Start Date: 18/Jul/19 02:19 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512637710 run python 3.6 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278658) Time Spent: 2h 50m (was: 2h 40m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278656 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 02:15 Start Date: 18/Jul/19 02:15 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512636972 run python 2 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278656) Time Spent: 2.5h (was: 2h 20m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work started] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-7246 started by Shehzaad Nakhoda. -- > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-6855) Side inputs are not supported when using the state API
[ https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887543#comment-16887543 ] Kenneth Knowles commented on BEAM-6855: --- What I mean is code something like this: {code} DoFnRunner statefulRunner = new StatefulDoFnRunner(...) PushbackSideInputDoFnRunner dofnRunner = SimplePushbackSideInputDoFnRunner.create(statefulRunner, ...) {code} > Side inputs are not supported when using the state API > -- > > Key: BEAM-6855 > URL: https://issues.apache.org/jira/browse/BEAM-6855 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API
[ https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shehzaad Nakhoda reassigned BEAM-6855: -- Assignee: (was: Shehzaad Nakhoda) > Side inputs are not supported when using the state API > -- > > Key: BEAM-6855 > URL: https://issues.apache.org/jira/browse/BEAM-6855 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work started] (BEAM-6855) Side inputs are not supported when using the state API
[ https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-6855 started by Shehzaad Nakhoda. -- > Side inputs are not supported when using the state API > -- > > Key: BEAM-6855 > URL: https://issues.apache.org/jira/browse/BEAM-6855 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API
[ https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shehzaad Nakhoda reassigned BEAM-6855: -- Assignee: Shehzaad Nakhoda > Side inputs are not supported when using the state API > -- > > Key: BEAM-6855 > URL: https://issues.apache.org/jira/browse/BEAM-6855 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses
[ https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=278647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278647 ] ASF GitHub Bot logged work on BEAM-7284: Author: ASF GitHub Bot Created on: 18/Jul/19 01:44 Start Date: 18/Jul/19 01:44 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9050: [BEAM-7284] enabled to pickle python3 dataclasses URL: https://github.com/apache/beam/pull/9050#issuecomment-512631458 Thanks a lot, @lazylynx! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278647) Time Spent: 40m (was: 0.5h) > Support Py3 Dataclasses > > > Key: BEAM-7284 > URL: https://issues.apache.org/jira/browse/BEAM-7284 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > It looks like dill does not support Dataclasses yet, > https://github.com/uqfoundation/dill/issues/312, which very likely means that > Beam does not support them either. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses
[ https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=278648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278648 ] ASF GitHub Bot logged work on BEAM-7284: Author: ASF GitHub Bot Created on: 18/Jul/19 01:44 Start Date: 18/Jul/19 01:44 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9050: [BEAM-7284] enabled to pickle python3 dataclasses URL: https://github.com/apache/beam/pull/9050#issuecomment-512631569 @robertwb Could you please help merge this? Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278648) Time Spent: 50m (was: 40m) > Support Py3 Dataclasses > > > Key: BEAM-7284 > URL: https://issues.apache.org/jira/browse/BEAM-7284 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 50m > Remaining Estimate: 0h > > It looks like dill does not support Dataclasses yet, > https://github.com/uqfoundation/dill/issues/312, which very likely means that > Beam does not support them either. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887536#comment-16887536 ] Ahmet Altay commented on BEAM-7246: --- [~raheelkhan] just checking, are you still blocked on this? > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-6675) The JdbcIO sink should accept schemas
[ https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887535#comment-16887535 ] Shehzaad Nakhoda commented on BEAM-6675: [~reuvenlax] Can this be marked as resolved? Thanks. > The JdbcIO sink should accept schemas > - > > Key: BEAM-6675 > URL: https://issues.apache.org/jira/browse/BEAM-6675 > Project: Beam > Issue Type: Sub-task > Components: io-java-jdbc >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > > If the input has a schema, there should be a default mapping to a > PreparedStatement for writing based on that schema. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7728) Support ParquetTable in SQL
[ https://issues.apache.org/jira/browse/BEAM-7728?focusedWorklogId=278645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278645 ] ASF GitHub Bot logged work on BEAM-7728: Author: ASF GitHub Bot Created on: 18/Jul/19 01:33 Start Date: 18/Jul/19 01:33 Worklog Time Spent: 10m Work Description: vectorijk commented on pull request #9054: [BEAM-7728] [SQL] Support ParquetTable URL: https://github.com/apache/beam/pull/9054#discussion_r304703525 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; + +import java.io.Serializable; +import org.apache.avro.generic.GenericRecord; +import org.apache.beam.sdk.extensions.sql.impl.schema.BaseBeamTable; +import org.apache.beam.sdk.io.parquet.ParquetIO; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.utils.AvroUtils; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.sdk.values.Row; + +/** {@link ParquetTable} is a {@link org.apache.beam.sdk.extensions.sql.BeamSqlTable}. */ +public class ParquetTable extends BaseBeamTable implements Serializable { + private final String filePattern; + + public ParquetTable(Schema beamSchema, String filePattern) { +super(beamSchema); +this.filePattern = filePattern; + } + + @Override + public PCollection buildIOReader(PBegin begin) { +PTransform, PCollection> readConverter = +GenericRecordReadConverter.builder().beamSchema(schema).build(); + +return begin +.apply("ParquetIORead", ParquetIO.read(AvroUtils.toAvroSchema(schema)).from(filePattern)) +.apply("GenericRecordToRow", readConverter); + } + + @Override + public PDone buildIOWriter(PCollection input) { +throw new UnsupportedOperationException("Writing to a Parquet file is not supported"); Review comment: okay, let me try this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278645) Time Spent: 1h 20m (was: 1h 10m) > Support ParquetTable in SQL > --- > > Key: BEAM-7728 > URL: https://issues.apache.org/jira/browse/BEAM-7728 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7728) Support ParquetTable in SQL
[ https://issues.apache.org/jira/browse/BEAM-7728?focusedWorklogId=278644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278644 ] ASF GitHub Bot logged work on BEAM-7728: Author: ASF GitHub Bot Created on: 18/Jul/19 01:32 Start Date: 18/Jul/19 01:32 Worklog Time Spent: 10m Work Description: vectorijk commented on pull request #9054: [BEAM-7728] [SQL] Support ParquetTable URL: https://github.com/apache/beam/pull/9054#discussion_r304703383 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/GenericRecordToRowTest.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.parquet; + +import java.io.Serializable; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericRecord; +import org.apache.beam.sdk.coders.AvroCoder; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.junit.Rule; +import org.junit.Test; + +/** Unit tests for {@link GenericRecordReadConverter}. */ +public class GenericRecordToRowTest implements Serializable { + @Rule public transient TestPipeline pipeline = TestPipeline.create(); + + org.apache.beam.sdk.schemas.Schema payloadSchema = Review comment: i see This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278644) Time Spent: 1h 10m (was: 1h) > Support ParquetTable in SQL > --- > > Key: BEAM-7728 > URL: https://issues.apache.org/jira/browse/BEAM-7728 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278643 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 01:27 Start Date: 18/Jul/19 01:27 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512628523 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278643) Time Spent: 2h 20m (was: 2h 10m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278640 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 01:22 Start Date: 18/Jul/19 01:22 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512627664 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278640) Time Spent: 2h 10m (was: 2h) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278632 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 18/Jul/19 00:51 Start Date: 18/Jul/19 00:51 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512621830 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278632) Time Spent: 2h (was: 1h 50m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 2h > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable
[ https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278631 ] ASF GitHub Bot logged work on BEAM-7545: Author: ASF GitHub Bot Created on: 18/Jul/19 00:48 Start Date: 18/Jul/19 00:48 Worklog Time Spent: 10m Work Description: akedin commented on pull request #9040: [BEAM-7545] Reordering Beam Joins URL: https://github.com/apache/beam/pull/9040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278631) Time Spent: 10h 10m (was: 10h) > Row Count Estimation for CSV TextTable > -- > > Key: BEAM-7545 > URL: https://issues.apache.org/jira/browse/BEAM-7545 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Fix For: Not applicable > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Implementing Row Count Estimation for CSV Tables by reading the first few > lines of the file and estimating the number of records based on the length of > these lines and the total length of the file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-7766: Status: Open (was: Triage Needed) > Dataflow runner should default to PiplelineState.UNKNOWN when job state > received via v1beta3 cannot be recognized. > -- > > Key: BEAM-7766 > URL: https://issues.apache.org/jira/browse/BEAM-7766 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.1.0 >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Minor > Fix For: 2.7.1, 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?focusedWorklogId=278629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278629 ] ASF GitHub Bot logged work on BEAM-7766: Author: ASF GitHub Bot Created on: 18/Jul/19 00:34 Start Date: 18/Jul/19 00:34 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9094: [BEAM-7766] Default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized. URL: https://github.com/apache/beam/pull/9094#issuecomment-512618753 @kennknowles I'd like to fix this on 2.7.1 and can prepare a cherry-pick once this is merged. I set 2.7.1 as fix version on BEAM-7766. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278629) Time Spent: 0.5h (was: 20m) > Dataflow runner should default to PiplelineState.UNKNOWN when job state > received via v1beta3 cannot be recognized. > -- > > Key: BEAM-7766 > URL: https://issues.apache.org/jira/browse/BEAM-7766 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.1.0 >Reporter: Valentyn Tymofieiev >Priority: Minor > Fix For: 2.7.1, 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-7766: - Assignee: Valentyn Tymofieiev > Dataflow runner should default to PiplelineState.UNKNOWN when job state > received via v1beta3 cannot be recognized. > -- > > Key: BEAM-7766 > URL: https://issues.apache.org/jira/browse/BEAM-7766 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.1.0 >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Minor > Fix For: 2.7.1, 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?focusedWorklogId=278628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278628 ] ASF GitHub Bot logged work on BEAM-7766: Author: ASF GitHub Bot Created on: 18/Jul/19 00:31 Start Date: 18/Jul/19 00:31 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9094: [BEAM-7766] Default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized. URL: https://github.com/apache/beam/pull/9094#issuecomment-512618315 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278628) Time Spent: 20m (was: 10m) > Dataflow runner should default to PiplelineState.UNKNOWN when job state > received via v1beta3 cannot be recognized. > -- > > Key: BEAM-7766 > URL: https://issues.apache.org/jira/browse/BEAM-7766 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.1.0 >Reporter: Valentyn Tymofieiev >Priority: Minor > Fix For: 2.7.1, 2.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
[ https://issues.apache.org/jira/browse/BEAM-7766?focusedWorklogId=278627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278627 ] ASF GitHub Bot logged work on BEAM-7766: Author: ASF GitHub Bot Created on: 18/Jul/19 00:30 Start Date: 18/Jul/19 00:30 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9094: [BEAM-7766] Default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized. URL: https://github.com/apache/beam/pull/9094 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) Pre-Commit Tests Status (on master branch) --- |Java | Python | Go |
[jira] [Created] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.
Valentyn Tymofieiev created BEAM-7766: - Summary: Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized. Key: BEAM-7766 URL: https://issues.apache.org/jira/browse/BEAM-7766 Project: Beam Issue Type: Bug Components: runner-dataflow Affects Versions: 2.1.0 Reporter: Valentyn Tymofieiev Fix For: 2.7.1, 2.15.0 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable
[ https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278625 ] ASF GitHub Bot logged work on BEAM-7545: Author: ASF GitHub Bot Created on: 18/Jul/19 00:26 Start Date: 18/Jul/19 00:26 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9040: [BEAM-7545] Reordering Beam Joins URL: https://github.com/apache/beam/pull/9040#discussion_r304693458 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java ## @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rule; + +import java.math.BigInteger; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider; +import org.apache.beam.sdk.values.Row; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.core.Join; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test ensures that we are reordering joins and get a plan similar to Join(large,Join(small, Review comment: Agree with Anton that changes and designs are documented by tests. As we are at an early stage of having optimizations, what we have now is definitely not perfect and we can refine them as time goes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278625) Time Spent: 9h 50m (was: 9h 40m) > Row Count Estimation for CSV TextTable > -- > > Key: BEAM-7545 > URL: https://issues.apache.org/jira/browse/BEAM-7545 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Fix For: Not applicable > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Implementing Row Count Estimation for CSV Tables by reading the first few > lines of the file and estimating the number of records based on the length of > these lines and the total length of the file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable
[ https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278626 ] ASF GitHub Bot logged work on BEAM-7545: Author: ASF GitHub Bot Created on: 18/Jul/19 00:26 Start Date: 18/Jul/19 00:26 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9040: [BEAM-7545] Reordering Beam Joins URL: https://github.com/apache/beam/pull/9040#issuecomment-512617407 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278626) Time Spent: 10h (was: 9h 50m) > Row Count Estimation for CSV TextTable > -- > > Key: BEAM-7545 > URL: https://issues.apache.org/jira/browse/BEAM-7545 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Fix For: Not applicable > > Time Spent: 10h > Remaining Estimate: 0h > > Implementing Row Count Estimation for CSV Tables by reading the first few > lines of the file and estimating the number of records based on the length of > these lines and the total length of the file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes
[ https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=278622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278622 ] ASF GitHub Bot logged work on BEAM-6877: Author: ASF GitHub Bot Created on: 18/Jul/19 00:15 Start Date: 18/Jul/19 00:15 Worklog Time Spent: 10m Work Description: udim commented on issue #8893: [BEAM-6877] trivial_inference: make remaining tests pass URL: https://github.com/apache/beam/pull/8893#issuecomment-512615395 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278622) Time Spent: 8h 50m (was: 8h 40m) > TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode > changes > > > Key: BEAM-6877 > URL: https://issues.apache.org/jira/browse/BEAM-6877 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Udi Meiri >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > Type inference doesn't work on Python 3.6 due to [bytecode to wordcode > changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes]. > Type inference always returns Any on Python 3.6, so this is not critical. > Affected tests are: > *transforms.ptransform_test*: > - test_combine_properly_pipeline_type_checks_using_decorator > - test_mean_globally_pipeline_checking_satisfied > - test_mean_globally_runtime_checking_satisfied > - test_count_globally_pipeline_type_checking_satisfied > - test_count_globally_runtime_type_checking_satisfied > - test_pardo_type_inference > - test_pipeline_inference > - test_inferred_bad_kv_type > *typehints.trivial_inference_test*: > - all tests in TrivialInferenceTest > *io.gcp.pubsub_test.TestReadFromPubSubOverride*: > * test_expand_with_other_options > * test_expand_with_subscription > * test_expand_with_topic -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes
[ https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=278621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278621 ] ASF GitHub Bot logged work on BEAM-6877: Author: ASF GitHub Bot Created on: 18/Jul/19 00:15 Start Date: 18/Jul/19 00:15 Worklog Time Spent: 10m Work Description: udim commented on issue #8893: [BEAM-6877] trivial_inference: make remaining tests pass URL: https://github.com/apache/beam/pull/8893#issuecomment-512615347 run python precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278621) Time Spent: 8h 40m (was: 8.5h) > TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode > changes > > > Key: BEAM-6877 > URL: https://issues.apache.org/jira/browse/BEAM-6877 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Udi Meiri >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > Type inference doesn't work on Python 3.6 due to [bytecode to wordcode > changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes]. > Type inference always returns Any on Python 3.6, so this is not critical. > Affected tests are: > *transforms.ptransform_test*: > - test_combine_properly_pipeline_type_checks_using_decorator > - test_mean_globally_pipeline_checking_satisfied > - test_mean_globally_runtime_checking_satisfied > - test_count_globally_pipeline_type_checking_satisfied > - test_count_globally_runtime_type_checking_satisfied > - test_pardo_type_inference > - test_pipeline_inference > - test_inferred_bad_kv_type > *typehints.trivial_inference_test*: > - all tests in TrivialInferenceTest > *io.gcp.pubsub_test.TestReadFromPubSubOverride*: > * test_expand_with_other_options > * test_expand_with_subscription > * test_expand_with_topic -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes
[ https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=278620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278620 ] ASF GitHub Bot logged work on BEAM-6877: Author: ASF GitHub Bot Created on: 18/Jul/19 00:13 Start Date: 18/Jul/19 00:13 Worklog Time Spent: 10m Work Description: udim commented on issue #8893: [BEAM-6877] trivial_inference: make remaining tests pass URL: https://github.com/apache/beam/pull/8893#issuecomment-512615014 R: @robertwb (in case you haven't seen all the emails) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278620) Time Spent: 8.5h (was: 8h 20m) > TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode > changes > > > Key: BEAM-6877 > URL: https://issues.apache.org/jira/browse/BEAM-6877 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Udi Meiri >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Type inference doesn't work on Python 3.6 due to [bytecode to wordcode > changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes]. > Type inference always returns Any on Python 3.6, so this is not critical. > Affected tests are: > *transforms.ptransform_test*: > - test_combine_properly_pipeline_type_checks_using_decorator > - test_mean_globally_pipeline_checking_satisfied > - test_mean_globally_runtime_checking_satisfied > - test_count_globally_pipeline_type_checking_satisfied > - test_count_globally_runtime_type_checking_satisfied > - test_pardo_type_inference > - test_pipeline_inference > - test_inferred_bad_kv_type > *typehints.trivial_inference_test*: > - all tests in TrivialInferenceTest > *io.gcp.pubsub_test.TestReadFromPubSubOverride*: > * test_expand_with_other_options > * test_expand_with_subscription > * test_expand_with_topic -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278618 ] ASF GitHub Bot logged work on BEAM-7726: Author: ASF GitHub Bot Created on: 18/Jul/19 00:13 Start Date: 18/Jul/19 00:13 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9080: [BEAM-7726] Implement State Backed Iterables in Go SDK URL: https://github.com/apache/beam/pull/9080#discussion_r304686804 ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -262,3 +282,60 @@ func (n *DataSource) Split(splits []int64, frac float32) (int64, error) { // return an error. return 0, fmt.Errorf("failed to split at requested splits: {%v}, DataSource at index: %v", splits, c) } + +type concatReStream struct { + first, next ReStream +} + +func (c *concatReStream) Open() (Stream, error) { + firstStream, err := c.first.Open() + if err != nil { + return nil, err + } + return {first: firstStream, nextStream: c.next}, nil +} + +type concatStream struct { + first Stream + nextStream ReStream +} + +// Close nils the stream. +func (s *concatStream) Close() error { + if s.first == nil { + return nil + } + defer func() { + s.first = nil + s.nextStream = nil + }() + return s.first.Close() +} + +func (s *concatStream) Read() (*FullValue, error) { + if s.first == nil { // When the stream is closed. + return nil, io.EOF + } + fv, err := s.first.Read() + if err == nil { + return fv, nil + } + if err == io.EOF { + if err := s.first.Close(); err != nil { + s.nextStream = nil + return nil, err + } + if s.nextStream == nil { + s.first = nil + return nil, io.EOF + } + s.first, err = s.nextStream.Open() Review comment: Just checking my understanding here: nextStream here is opening an elementStream reading from the state-backed iterable (ScopedStateReader in statemgr.go), so that new stream will automatically get continuations from the state channel, right? It took me a while to trace how this was working, so I wanna confirm that I understood it correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278618) Time Spent: 1h 20m (was: 1h 10m) > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278619 ] ASF GitHub Bot logged work on BEAM-7726: Author: ASF GitHub Bot Created on: 18/Jul/19 00:13 Start Date: 18/Jul/19 00:13 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #9080: [BEAM-7726] Implement State Backed Iterables in Go SDK URL: https://github.com/apache/beam/pull/9080#discussion_r304680191 ## File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go ## @@ -72,117 +79,129 @@ func (n *DataSource) Process(ctx context.Context) error { c := coder.SkipW(n.Coder) wc := MakeWindowDecoder(n.Coder.Window) + var cp ElementDecoder// Decoder for the primary element or the key in CoGBKs. + var cvs []ElementDecoder // Decoders for each value stream in CoGBKs. + switch { case coder.IsCoGBK(c): - ck := MakeElementDecoder(c.Components[0]) - cv := MakeElementDecoder(c.Components[1]) + cp = MakeElementDecoder(c.Components[0]) - for { - if n.IncrementCountAndCheckSplit(ctx) { + // TODO(BEAM-490): Support multiple value streams (coder components) with + // with CoGBK. + cvs = []ElementDecoder{MakeElementDecoder(c.Components[1])} + default: + cp = MakeElementDecoder(c) + } + + for { + if n.IncrementCountAndCheckSplit(ctx) { + return nil + } + ws, t, err := DecodeWindowedValueHeader(wc, r) + if err != nil { + if err == io.EOF { return nil } - ws, t, err := DecodeWindowedValueHeader(wc, r) - if err != nil { - if err == io.EOF { - return nil - } - return errors.Wrap(err, "source failed") - } + return errors.Wrap(err, "source failed") + } - // Decode key + // Decode key or parallel element. + pe, err := cp.Decode(r) + if err != nil { + return errors.Wrap(err, "source decode failed") + } + pe.Timestamp = t + pe.Windows = ws - key, err := ck.Decode(r) + var valReStreams []ReStream + for _, cv := range cvs { + values, err := n.makeReStream(ctx, pe, cv, r) if err != nil { - return errors.Wrap(err, "source decode failed") + return err } - key.Timestamp = t - key.Windows = ws + valReStreams = append(valReStreams, values) + } - // TODO(herohde) 4/30/2017: the State API will be handle re-iterations - // and only "small" value streams would be inline. Presumably, that - // would entail buffering the whole stream. We do that for now. + if err := n.Out.ProcessElement(ctx, pe, valReStreams...); err != nil { + return err + } + } +} - var buf []FullValue +func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv ElementDecoder, r io.ReadCloser) (ReStream, error) { + size, err := coder.DecodeInt32(r) + if err != nil { + return nil, errors.Wrap(err, "stream size decoding failed") + } - size, err := coder.DecodeInt32(r) + switch { + case size >= 0: + // Single chunk streams are fully read in and buffered in memory. + var buf []FullValue + buf, err = readStreamToBuffer(cv, r, int64(size), buf) + if err != nil { + return nil, err + } + return {Buf: buf}, nil + case size == -1: // Shouldn't this be 0? + // Multi-chunked stream. + var buf []FullValue + for { + chunk, err := coder.DecodeVarInt(r) if err != nil { - return errors.Wrap(err, "stream size decoding failed") + return nil, errors.Wrap(err, "stream chunk size decoding failed") } - - if size > -1 { - // Single chunk stream. - -
[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests
[ https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=278599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278599 ] ASF GitHub Bot logged work on BEAM-7484: Author: ASF GitHub Bot Created on: 17/Jul/19 23:48 Start Date: 17/Jul/19 23:48 Worklog Time Spent: 10m Work Description: udim commented on pull request #8766: [BEAM-7484] Metrics collection in BigQuery perf tests URL: https://github.com/apache/beam/pull/8766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278599) Time Spent: 5h (was: 4h 50m) > Throughput collection in BigQuery performance tests > --- > > Key: BEAM-7484 > URL: https://issues.apache.org/jira/browse/BEAM-7484 > Project: Beam > Issue Type: New Feature > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > The goal is to collect bytes/time and messages/time metrics in BQ read and > write tests in Python SDK. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests
[ https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=278598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278598 ] ASF GitHub Bot logged work on BEAM-7484: Author: ASF GitHub Bot Created on: 17/Jul/19 23:47 Start Date: 17/Jul/19 23:47 Worklog Time Spent: 10m Work Description: udim commented on issue #8766: [BEAM-7484] Metrics collection in BigQuery perf tests URL: https://github.com/apache/beam/pull/8766#issuecomment-512609860 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278598) Time Spent: 4h 50m (was: 4h 40m) > Throughput collection in BigQuery performance tests > --- > > Key: BEAM-7484 > URL: https://issues.apache.org/jira/browse/BEAM-7484 > Project: Beam > Issue Type: New Feature > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > The goal is to collect bytes/time and messages/time metrics in BQ read and > write tests in Python SDK. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278586 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 17/Jul/19 23:29 Start Date: 17/Jul/19 23:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512605907 run python 2 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278586) Time Spent: 1h 20m (was: 1h 10m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278587 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 17/Jul/19 23:29 Start Date: 17/Jul/19 23:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512605970 run python 3.5 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278587) Time Spent: 1.5h (was: 1h 20m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278590 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 17/Jul/19 23:29 Start Date: 17/Jul/19 23:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512606052 run python 3.7 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278590) Time Spent: 1h 50m (was: 1h 40m) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278589 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 17/Jul/19 23:29 Start Date: 17/Jul/19 23:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512606008 run python 3.6 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278589) Time Spent: 1h 40m (was: 1.5h) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278585 ] ASF GitHub Bot logged work on BEAM-4948: Author: ASF GitHub Bot Created on: 17/Jul/19 23:27 Start Date: 17/Jul/19 23:27 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8899: [BEAM-4948, BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for all our vendored artifacts containing guava URL: https://github.com/apache/beam/pull/8899#issuecomment-512605643 The issue is that Guava migrated to the checkerframework `@Nullable` instead of the javax version which made spotbugs perform its nullness checks. For example, the Guava Function class has the parameter marked as `@Nullable` which means that the function must correctly handle null inputs which some of our previous Function implementations were not. So I could either update them to handle null inputs or mark them as `@Nonnull`. The issue with the latter is that we are now narrowing the definition from a Function that took nullable input to one that didn't which required a different FB suppression. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278585) Time Spent: 4h 10m (was: 4h) > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7679) examples.complete.game ITs might use the same BQ dataset
[ https://issues.apache.org/jira/browse/BEAM-7679?focusedWorklogId=278575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278575 ] ASF GitHub Bot logged work on BEAM-7679: Author: ASF GitHub Bot Created on: 17/Jul/19 23:16 Start Date: 17/Jul/19 23:16 Worklog Time Spent: 10m Work Description: udim commented on issue #8991: [BEAM-7679] Add randomness to ITs' BQ dataset name URL: https://github.com/apache/beam/pull/8991#issuecomment-512602924 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278575) Time Spent: 2h (was: 1h 50m) > examples.complete.game ITs might use the same BQ dataset > > > Key: BEAM-7679 > URL: https://issues.apache.org/jira/browse/BEAM-7679 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Code is: > {code:java} > unique_dataset_name = dataset_base_name + str(int(time.time())) > {code} > [https://github.com/apache/beam/blob/932e802279a2daa0ff7797a8fc81e952a4e4f252/sdks/python/apache_beam/io/gcp/tests/utils.py#L59] > > Example log: > [https://builds.apache.org/job/beam_PostCommit_Python3_Verify_PR/476/consoleFull] > I suspect this issue because of this error: > {code:java} > google.api_core.exceptions.NotFound: 404 Not found: Table > apache-beam-testing:leader_board_it_dataset1562016299.leader_board_teams was > not found in location US{code} > and the fact that a lot of such tests started at the same second. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run
Udi Meiri created BEAM-7765: --- Summary: Add test for snippet accessing_valueprovider_info_after_run Key: BEAM-7765 URL: https://issues.apache.org/jira/browse/BEAM-7765 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri This snippet needs a unit test. It has bugs. For example: - apache_beam.utils.value_provider doesn't exist - beam.combiners.Sum doesn't exist - unused import of: WriteToText cc: [~pabloem] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278574 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 17/Jul/19 23:11 Start Date: 17/Jul/19 23:11 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093#issuecomment-512601641 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278574) Time Spent: 1h 10m (was: 1h) > Add withProducerConfigUpdates to KafkaIO > > > Key: BEAM-7257 > URL: https://issues.apache.org/jira/browse/BEAM-7257 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > adding withProducerConfigUpdates and deprecating updateProducerProperties -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278572 ] ASF GitHub Bot logged work on BEAM-7257: Author: ASF GitHub Bot Created on: 17/Jul/19 23:05 Start Date: 17/Jul/19 23:05 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9093: [WIP] [BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs. URL: https://github.com/apache/beam/pull/9093 Changes Jenkins jobs for Python 3.5, 3.6, 3.7 test suites. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278570 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 17/Jul/19 23:00 Start Date: 17/Jul/19 23:00 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#discussion_r304605958 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -193,7 +200,7 @@ def __repr__(self): self.input_types, self.output_types) -class WithTypeHints(object): +class WithTypeHints(Generic[InT, OutT]): Review comment: True. I think the somewhat unsatisfactory answer is that for the time being you need both: one for runtime checking and the other for static checking, until such a time as they can become the same. I think trying to do that all in one PR is going to be too much. off the top of my head, the order this should probably be done is: 1. support runtime type hints using `typing` module instead of `typehints`: https://issues.apache.org/jira/browse/BEAM-7713 2. add static type hints to the beam code and begin enforcing it using mypy: this PR (https://issues.apache.org/jira/browse/BEAM-7746) and possibly https://issues.apache.org/jira/browse/BEAM-7712 3. support static validation of user pipelines (mypy plugin, etc) 4. support runtime validations based on `typing` annotations There are a lot of "ifs" surrounding step 4. We may need to get to python3-only first to avoid the pitfalls of type comments. We may find that step 3 makes it less important. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278570) Time Spent: 3h 40m (was: 3.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-1580) Typo in bigquery_tornadoes example
[ https://issues.apache.org/jira/browse/BEAM-1580?focusedWorklogId=278565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278565 ] ASF GitHub Bot logged work on BEAM-1580: Author: ASF GitHub Bot Created on: 17/Jul/19 22:55 Start Date: 17/Jul/19 22:55 Worklog Time Spent: 10m Work Description: coveralls commented on issue #2390: [BEAM-1580] Fixed typos in the Python SDK examples. ( tornatoes -> tornadoes ) URL: https://github.com/apache/beam/pull/2390#issuecomment-290641745 [![Coverage Status](https://coveralls.io/builds/24635781/badge)](https://coveralls.io/builds/24635781) Coverage increased (+28.0%) to 98.318% when pulling **1b59e33d9aa7cd4c2505b7bbfb581e22fe1bf96d on sungjunyoung:master** into **935ecd4e032e18e428ee33cbf5484c5fce726b4f on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278565) Time Spent: 10m Remaining Estimate: 0h > Typo in bigquery_tornadoes example > -- > > Key: BEAM-1580 > URL: https://issues.apache.org/jira/browse/BEAM-1580 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Trivial > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There are spelling errors in the example code (e.g. "tornatoes") -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278555 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 17/Jul/19 22:27 Start Date: 17/Jul/19 22:27 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#issuecomment-512591918 I'm really curious to know what the general consensus is on this PR, implementation details aside. Do you all like the idea of adding type annotations? If I can get some subset of the current package passing, would you be willing to merge something like this in? What do you see as the major blockers? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278555) Time Spent: 3.5h (was: 3h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278550 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 17/Jul/19 22:20 Start Date: 17/Jul/19 22:20 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064#issuecomment-512590061 Getting very slow downloads from maven central locally, too, which could be part of the issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278550) Time Spent: 1h 10m (was: 1h) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278525 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 17/Jul/19 22:11 Start Date: 17/Jul/19 22:11 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064#issuecomment-512587773 I've seen many builds failing due to dependency download issues. I will run this and publish a scan. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278525) Time Spent: 1h (was: 50m) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 1h > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=278517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278517 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 17/Jul/19 22:05 Start Date: 17/Jul/19 22:05 Worklog Time Spent: 10m Work Description: eddie-scio commented on issue #8457: [BEAM-3342] Create a Cloud Bigtable IO connector for Python URL: https://github.com/apache/beam/pull/8457#issuecomment-512586110 Is there an ETA for landing this? Thanks for all the work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278517) Time Spent: 28h (was: 27h 50m) > Create a Cloud Bigtable IO connector for Python > --- > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > Time Spent: 28h > Remaining Estimate: 0h > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278512 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 17/Jul/19 21:56 Start Date: 17/Jul/19 21:56 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#discussion_r304659145 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -465,56 +484,70 @@ def get_windowing(self, inputs): return inputs[0].windowing def __rrshift__(self, label): +# type: (str) -> _NamedPTransform[InT, OutT] return _NamedPTransform(self, label) def __or__(self, right): +# type: (PTransform[InT, OutT], PTransform[OutT, T]) -> _ChainedPTransform[OutT, T] """Used to compose PTransforms, e.g., ptransform1 | ptransform2.""" if isinstance(right, PTransform): return _ChainedPTransform(self, right) return NotImplemented - def __ror__(self, left, label=None): -"""Used to apply this PTransform to non-PValues, e.g., a tuple.""" -pvalueish, pvalues = self._extract_input_pvalues(left) -pipelines = [v.pipeline for v in pvalues if isinstance(v, pvalue.PValue)] -if pvalues and not pipelines: - deferred = False + if not typing.TYPE_CHECKING: Review comment: ah, sorry, I missed the context here. yeah, that would be a nice feature. would be worth bringing up at the mypy github repo. note that this change is to accommodate analyzing user pipelines via the mypy plugin, which I'm now thinking would be best to separate into another PR. With some more hacking, I might ultimately be able to avoid this bit of ugliness. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278512) Time Spent: 3h 20m (was: 3h 10m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7764) Dataflow run fails when service account is not set.
[ https://issues.apache.org/jira/browse/BEAM-7764?focusedWorklogId=278509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278509 ] ASF GitHub Bot logged work on BEAM-7764: Author: ASF GitHub Bot Created on: 17/Jul/19 21:50 Start Date: 17/Jul/19 21:50 Worklog Time Spent: 10m Work Description: potatogopher commented on pull request #9092: [BEAM-7764] Add the ability to set the service account email for dataflow jobs URL: https://github.com/apache/beam/pull/9092 The dataflow runner is not setting the service account for the job that is being set up. This causes failures when trying to deploy. ``` Workflow failed. Causes: There was a problem refreshing your credentials. Please check: 1. Dataflow API is enabled for your project. 2. There is a robot service account for your project: service-[project number]@dataflow-service-producer-prod.iam.gserviceaccount.com should have access to your project. If this account does not appear in the permissions tab for your project, contact Dataflow support. ``` Adding a flag to set the service account will fix this issue. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build
[jira] [Created] (BEAM-7764) Dataflow run fails when service account is not set.
Nick Rucci created BEAM-7764: Summary: Dataflow run fails when service account is not set. Key: BEAM-7764 URL: https://issues.apache.org/jira/browse/BEAM-7764 Project: Beam Issue Type: Bug Components: runner-dataflow, sdk-go Reporter: Nick Rucci The dataflow runner is not setting the service account for the job that is being set up. This causes failures when trying to deploy. ``` Workflow failed. Causes: There was a problem refreshing your credentials. Please check: 1. Dataflow API is enabled for your project. 2. There is a robot service account for your project: service-[project number]@dataflow-service-producer-prod.iam.gserviceaccount.com should have access to your project. If this account does not appear in the permissions tab for your project, contact Dataflow support. ``` Adding a flag to set the service account will fix this issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=278499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278499 ] ASF GitHub Bot logged work on BEAM-7079: Author: ASF GitHub Bot Created on: 17/Jul/19 21:40 Start Date: 17/Jul/19 21:40 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8939: [BEAM-7079] Add Chicago Taxi Example running on Dataflow URL: https://github.com/apache/beam/pull/8939#issuecomment-512579507 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278499) Time Spent: 22h (was: 21h 50m) > Run Chicago Taxi Example on Dataflow > > > Key: BEAM-7079 > URL: https://issues.apache.org/jira/browse/BEAM-7079 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 22h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278490 ] ASF GitHub Bot logged work on BEAM-7726: Author: ASF GitHub Bot Created on: 17/Jul/19 21:22 Start Date: 17/Jul/19 21:22 Worklog Time Spent: 10m Work Description: lostluck commented on issue #9080: [BEAM-7726] Implement State Backed Iterables in Go SDK URL: https://github.com/apache/beam/pull/9080#issuecomment-512573799 Run Go PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278490) Time Spent: 1h 10m (was: 1h) > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278489 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 17/Jul/19 21:19 Start Date: 17/Jul/19 21:19 Worklog Time Spent: 10m Work Description: udim commented on pull request #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#discussion_r304646263 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -465,56 +484,70 @@ def get_windowing(self, inputs): return inputs[0].windowing def __rrshift__(self, label): +# type: (str) -> _NamedPTransform[InT, OutT] return _NamedPTransform(self, label) def __or__(self, right): +# type: (PTransform[InT, OutT], PTransform[OutT, T]) -> _ChainedPTransform[OutT, T] """Used to compose PTransforms, e.g., ptransform1 | ptransform2.""" if isinstance(right, PTransform): return _ChainedPTransform(self, right) return NotImplemented - def __ror__(self, left, label=None): -"""Used to apply this PTransform to non-PValues, e.g., a tuple.""" -pvalueish, pvalues = self._extract_input_pvalues(left) -pipelines = [v.pipeline for v in pvalues if isinstance(v, pvalue.PValue)] -if pvalues and not pipelines: - deferred = False + if not typing.TYPE_CHECKING: Review comment: I don't think you can decorate an import, but this is a method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278489) Time Spent: 3h 10m (was: 3h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278485 ] ASF GitHub Bot logged work on BEAM-4948: Author: ASF GitHub Bot Created on: 17/Jul/19 21:17 Start Date: 17/Jul/19 21:17 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8899: [BEAM-4948, BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for all our vendored artifacts containing guava URL: https://github.com/apache/beam/pull/8899 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278485) Time Spent: 4h (was: 3h 50m) > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7680) synthetic_pipeline_test.py flaky
[ https://issues.apache.org/jira/browse/BEAM-7680?focusedWorklogId=278477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278477 ] ASF GitHub Bot logged work on BEAM-7680: Author: ASF GitHub Bot Created on: 17/Jul/19 21:10 Start Date: 17/Jul/19 21:10 Worklog Time Spent: 10m Work Description: udim commented on pull request #8993: [BEAM-7680] Skip flaky tests URL: https://github.com/apache/beam/pull/8993 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278477) Time Spent: 2h 20m (was: 2h 10m) > synthetic_pipeline_test.py flaky > > > Key: BEAM-7680 > URL: https://issues.apache.org/jira/browse/BEAM-7680 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Kasia Kucharczyk >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > {code:java} > 11:51:43 FAIL: testSyntheticSDFStep > (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest) > 11:51:43 > -- > 11:51:43 Traceback (most recent call last): > 11:51:43 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py", > line 82, in testSyntheticSDFStep > 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed) > 11:51:43 AssertionError: False is not true : 3.659700632095337{code} > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull] > > Two flaky TODOs: > [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7680) synthetic_pipeline_test.py flaky
[ https://issues.apache.org/jira/browse/BEAM-7680?focusedWorklogId=278478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278478 ] ASF GitHub Bot logged work on BEAM-7680: Author: ASF GitHub Bot Created on: 17/Jul/19 21:10 Start Date: 17/Jul/19 21:10 Worklog Time Spent: 10m Work Description: udim commented on issue #8993: [BEAM-7680] Skip flaky tests URL: https://github.com/apache/beam/pull/8993#issuecomment-512569941 Sure @kkucharc, this was supposed to be a quick fix until the flakiness is fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278478) Time Spent: 2.5h (was: 2h 20m) > synthetic_pipeline_test.py flaky > > > Key: BEAM-7680 > URL: https://issues.apache.org/jira/browse/BEAM-7680 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Kasia Kucharczyk >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > {code:java} > 11:51:43 FAIL: testSyntheticSDFStep > (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest) > 11:51:43 > -- > 11:51:43 Traceback (most recent call last): > 11:51:43 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py", > line 82, in testSyntheticSDFStep > 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed) > 11:51:43 AssertionError: False is not true : 3.659700632095337{code} > [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull] > > Two flaky TODOs: > [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7499) ReifyTest.test_window fails in DirectRunner due to 'assign_context.window should not be None.'
[ https://issues.apache.org/jira/browse/BEAM-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada resolved BEAM-7499. - Resolution: Fixed Fix Version/s: 2.15.0 > ReifyTest.test_window fails in DirectRunner due to 'assign_context.window > should not be None.' > -- > > Key: BEAM-7499 > URL: https://issues.apache.org/jira/browse/BEAM-7499 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, test-failures >Reporter: Luke Cwik >Assignee: Pablo Estrada >Priority: Minor > Fix For: 2.15.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > > [PR 8717|https://github.com/apache/beam/pull/8717] added > ReifyWindow.test_window which fails on the DirectRunner. > {code:java} > ERROR:root:Exception at bundle > , > due to an exception. > Traceback (most recent call last): > File "apache_beam/runners/direct/executor.py", line 343, in call > finish_state) > File "apache_beam/runners/direct/executor.py", line 380, in attempt_call > evaluator.process_element(value) > File "apache_beam/runners/direct/transform_evaluator.py", line 636, in > process_element > self.runner.process(element) > File "apache_beam/runners/common.py", line 780, in > apache_beam.runners.common.DoFnRunner.process > def process(self, windowed_value): > File "apache_beam/runners/common.py", line 784, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 851, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 453, in > apache_beam.runners.common.SimpleInvoker.invoke_process > output_processor.process_outputs( > File "apache_beam/runners/common.py", line 915, in > apache_beam.runners.common._OutputProcessor.process_outputs > self.window_fn.assign(assign_context)) > File "apache_beam/transforms/util.py", line 557, in assign > 'assign_context.window should not be None. ' > ValueError: assign_context.window should not be None. This might be due to a > DoFn returning a TimestampedValue. [while running 'add_timestamps2'] > Traceback (most recent call last): > File "apache_beam/transforms/util_test.py", line 501, in test_window > assert_that(reified_pc, equal_to(expected), reify_windows=True) > File "apache_beam/pipeline.py", line 426, in __exit__ > self.run().wait_until_finish() > File "apache_beam/testing/test_pipeline.py", line 109, in run > state = result.wait_until_finish() > File "apache_beam/runners/direct/direct_runner.py", line 430, in > wait_until_finish > self._executor.await_completion() > File "apache_beam/runners/direct/executor.py", line 400, in await_completion > self._executor.await_completion() > File "apache_beam/runners/direct/executor.py", line 446, in await_completion > raise_(t, v, tb) > File "apache_beam/runners/direct/executor.py", line 343, in call > finish_state) > File "apache_beam/runners/direct/executor.py", line 380, in attempt_call > evaluator.process_element(value) > File "apache_beam/runners/direct/transform_evaluator.py", line 636, in > process_element > self.runner.process(element) > File "apache_beam/runners/common.py", line 780, in > apache_beam.runners.common.DoFnRunner.process > def process(self, windowed_value): > File "apache_beam/runners/common.py", line 784, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 851, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > windowed_value, self.process_method(windowed_value.value)) > File "apache_beam/transforms/core.py", line 1292, in > wrapper = lambda x: [fn(x)] > File "apache_beam/testing/util.py", line 129, in _equal > 'Failed assert: %r == %r' % (sorted_expected, sorted_actual)) > BeamAssertException: Failed assert: [TestWindowedValue(value=('a', 100, > GlobalWindow), timestamp=100, windows=[GlobalWindow]), > TestWindowedValue(value=('b', 200, GlobalWindow), timestamp=200, > windows=[GlobalWindow]), TestWindowedValue(value=('c', 300, GlobalWindow), > timestamp=300, windows=[GlobalWindow])] == [TestWindowedValue(value=(('a', > 100.0, (GlobalWindow,), PaneInfo(first: True, last: True,
[jira] [Updated] (BEAM-7763) Python DirectRunner _PubSubReadEvaluator creates new client per bundle
[ https://issues.apache.org/jira/browse/BEAM-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-7763: Status: Open (was: Triage Needed) > Python DirectRunner _PubSubReadEvaluator creates new client per bundle > -- > > Key: BEAM-7763 > URL: https://issues.apache.org/jira/browse/BEAM-7763 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > Labels: easy > > Lots of credential fetches. > Similar to https://issues.apache.org/jira/browse/BEAM-2264 > but in this case the DirectRunner implementation seems to be creating a new > client for each bundle: > https://github.com/apache/beam/blob/d5d7a7b7d0408d8435031e7bfce1abe2227115f5/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L474 > From: > https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278476 ] ASF GitHub Bot logged work on BEAM-4948: Author: ASF GitHub Bot Created on: 17/Jul/19 21:07 Start Date: 17/Jul/19 21:07 Worklog Time Spent: 10m Work Description: iemejia commented on issue #8899: [BEAM-4948, BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for all our vendored artifacts containing guava URL: https://github.com/apache/beam/pull/8899#issuecomment-512569114 Please self merge. Have two minor comments: 1. We used to supress spotbugs warnings via a filters exclusion file, probably worth to keep that for consistency, but we can do that after in a subsequent PR. 2. I really did not understand why it now complains to add a `@Nonnull` annotation, that's a bit of a bummer if we need to do this explicit, but specially I did not get why it does not complain in other parts (luckily maybe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278476) Time Spent: 3h 50m (was: 3h 40m) > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava
[ https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278474 ] ASF GitHub Bot logged work on BEAM-4948: Author: ASF GitHub Bot Created on: 17/Jul/19 21:06 Start Date: 17/Jul/19 21:06 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8899: [BEAM-4948, BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for all our vendored artifacts containing guava URL: https://github.com/apache/beam/pull/8899#issuecomment-512568693 Thanks Luke! LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278474) Time Spent: 3h 40m (was: 3.5h) > Beam Dependency Update Request: com.google.guava > > > Key: BEAM-4948 > URL: https://issues.apache.org/jira/browse/BEAM-4948 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > 2018-07-25 20:28:03.628639 > Please review and upgrade the com.google.guava to the latest version > None > > cc: -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7262) LTS backport: normalize httplib2.Http initialization and usage
[ https://issues.apache.org/jira/browse/BEAM-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-7262. --- Resolution: Fixed > LTS backport: normalize httplib2.Http initialization and usage > -- > > Key: BEAM-7262 > URL: https://issues.apache.org/jira/browse/BEAM-7262 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Ideally solve both issues below in one PR, but issue 1 has priority as it can > halt a pipeline. > Issue 1: > Datastore client (and other httplib2-based clients for GCS, Dataflow, > BigQuery, etc.) doesn't set a socket timeout. > This can cause _flush_batch() in datastoreio.py to block forever waiting for > a response. > This issue is very similar to https://issues.apache.org/jira/browse/BEAM-5915 > and the solution should be similar. > Issue 2: > Standardize use of proxy environment settings, as in gcsio: > https://github.com/apache/beam/blob/8d3389df78aa2e0a0de06b7c5743ca3530dec4ac/sdks/python/apache_beam/io/gcp/gcsio.py#L136 > Issue for proxy settings: https://issues.apache.org/jira/browse/BEAM-3184 -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7722) Simplify running of Beam Python on Flink
[ https://issues.apache.org/jira/browse/BEAM-7722?focusedWorklogId=278472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278472 ] ASF GitHub Bot logged work on BEAM-7722: Author: ASF GitHub Bot Created on: 17/Jul/19 21:03 Start Date: 17/Jul/19 21:03 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9043: [BEAM-7722] Add a Python FlinkRunner that fetches and uses released artifacts. URL: https://github.com/apache/beam/pull/9043#issuecomment-512567781 This looks like a great step toward making the portable Flink runner more usable. Is it premature to update the documentation along with this PR? https://beam.apache.org/documentation/runners/flink/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278472) Time Spent: 4h 20m (was: 4h 10m) > Simplify running of Beam Python on Flink > > > Key: BEAM-7722 > URL: https://issues.apache.org/jira/browse/BEAM-7722 > Project: Beam > Issue Type: Test > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently this requires building and running several processes. We should be > able to automate most of this away. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yueyang Qiu updated BEAM-7013: -- Fix Version/s: 2.15.0 > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7641) collect statistics about python ITs
[ https://issues.apache.org/jira/browse/BEAM-7641?focusedWorklogId=278469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278469 ] ASF GitHub Bot logged work on BEAM-7641: Author: ASF GitHub Bot Created on: 17/Jul/19 20:52 Start Date: 17/Jul/19 20:52 Worklog Time Spent: 10m Work Description: udim commented on pull request #8952: [BEAM-7641] Collect xunit statistics for Py ITs URL: https://github.com/apache/beam/pull/8952 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278469) Time Spent: 5h (was: 4h 50m) > collect statistics about python ITs > --- > > Key: BEAM-7641 > URL: https://issues.apache.org/jira/browse/BEAM-7641 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Currently ITs don't generate xunit (nosetests.xml) files. > Having this data will make it easier to see which tests failed in a > pre/postcommit run, and to tell if a particular test is flaky. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables
[ https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=278468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278468 ] ASF GitHub Bot logged work on BEAM-5191: Author: ASF GitHub Bot Created on: 17/Jul/19 20:50 Start Date: 17/Jul/19 20:50 Worklog Time Spent: 10m Work Description: jklukas commented on issue #8945: [BEAM-5191] Support for BigQuery clustering URL: https://github.com/apache/beam/pull/8945#issuecomment-512563458 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278468) Time Spent: 13h 40m (was: 13.5h) > Add support for writing to BigQuery clustered tables > > > Key: BEAM-5191 > URL: https://issues.apache.org/jira/browse/BEAM-5191 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.6.0 >Reporter: Robert Sahlin >Assignee: Wout Scheepers >Priority: Minor > Labels: features, newbie > Time Spent: 13h 40m > Remaining Estimate: 0h > > Google recently added support for clustered tables in BigQuery. It would be > useful to set clustering columns the same way as for partitioning. It should > support multiple fields (4) for clustering. > For example: > [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]> > .withClustering(new Clustering().setField("productId").setType("STRING")) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables
[ https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=278467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278467 ] ASF GitHub Bot logged work on BEAM-5191: Author: ASF GitHub Bot Created on: 17/Jul/19 20:50 Start Date: 17/Jul/19 20:50 Worklog Time Spent: 10m Work Description: jklukas commented on issue #8945: [BEAM-5191] Support for BigQuery clustering URL: https://github.com/apache/beam/pull/8945#issuecomment-512563416 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278467) Time Spent: 13.5h (was: 13h 20m) > Add support for writing to BigQuery clustered tables > > > Key: BEAM-5191 > URL: https://issues.apache.org/jira/browse/BEAM-5191 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.6.0 >Reporter: Robert Sahlin >Assignee: Wout Scheepers >Priority: Minor > Labels: features, newbie > Time Spent: 13.5h > Remaining Estimate: 0h > > Google recently added support for clustered tables in BigQuery. It would be > useful to set clustering columns the same way as for partitioning. It should > support multiple fields (4) for clustering. > For example: > [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]> > .withClustering(new Clustering().setField("productId").setType("STRING")) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7641) collect statistics about python ITs
[ https://issues.apache.org/jira/browse/BEAM-7641?focusedWorklogId=278462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278462 ] ASF GitHub Bot logged work on BEAM-7641: Author: ASF GitHub Bot Created on: 17/Jul/19 20:38 Start Date: 17/Jul/19 20:38 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8952: [BEAM-7641] Collect xunit statistics for Py ITs URL: https://github.com/apache/beam/pull/8952#issuecomment-512559185 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278462) Time Spent: 4h 50m (was: 4h 40m) > collect statistics about python ITs > --- > > Key: BEAM-7641 > URL: https://issues.apache.org/jira/browse/BEAM-7641 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently ITs don't generate xunit (nosetests.xml) files. > Having this data will make it easier to see which tests failed in a > pre/postcommit run, and to tell if a particular test is flaky. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=278460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278460 ] ASF GitHub Bot logged work on BEAM-7079: Author: ASF GitHub Bot Created on: 17/Jul/19 20:30 Start Date: 17/Jul/19 20:30 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8939: [BEAM-7079] Add Chicago Taxi Example running on Dataflow URL: https://github.com/apache/beam/pull/8939#issuecomment-512556486 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278460) Time Spent: 21h 50m (was: 21h 40m) > Run Chicago Taxi Example on Dataflow > > > Key: BEAM-7079 > URL: https://issues.apache.org/jira/browse/BEAM-7079 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Michal Walenia >Assignee: Michal Walenia >Priority: Minor > Time Spent: 21h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call
[ https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887387#comment-16887387 ] Udi Meiri edited comment on BEAM-2264 at 7/17/19 8:16 PM: -- And also affects pubsub (under directrunner): https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298 Created https://issues.apache.org/jira/browse/BEAM-7763 was (Author: udim): And also affects pubsub (under directrunner): https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298 > Re-use credential instead of generating a new one one each GCS call > --- > > Key: BEAM-2264 > URL: https://issues.apache.org/jira/browse/BEAM-2264 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Luke Cwik >Assignee: Udi Meiri >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > We should cache the credential used within a Pipeline and re-use it instead > of generating a new one on each GCS call. When executing (against 2.0.0 RC2): > {code} > python -m apache_beam.examples.wordcount --input > "gs://dataflow-samples/shakespeare/*" --output local_counts > {code} > Note that we seemingly generate a new access token each time instead of when > a refresh is required. > {code} > super(GcsIO, cls).__new__(cls, storage_client)) > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 1 files. Estimation > took 0.286200046539 seconds > INFO:root:Running pipeline with DirectRunner. > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 43 files. Estimation > took 0.205624818802 seconds > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > ... many more times ... > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7763) Python DirectRunner _PubSubReadEvaluator creates new client per bundle
Udi Meiri created BEAM-7763: --- Summary: Python DirectRunner _PubSubReadEvaluator creates new client per bundle Key: BEAM-7763 URL: https://issues.apache.org/jira/browse/BEAM-7763 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Lots of credential fetches. Similar to https://issues.apache.org/jira/browse/BEAM-2264 but in this case the DirectRunner implementation seems to be creating a new client for each bundle: https://github.com/apache/beam/blob/d5d7a7b7d0408d8435031e7bfce1abe2227115f5/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L474 From: https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable
[ https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278451 ] ASF GitHub Bot logged work on BEAM-7545: Author: ASF GitHub Bot Created on: 17/Jul/19 20:05 Start Date: 17/Jul/19 20:05 Worklog Time Spent: 10m Work Description: akedin commented on issue #9040: [BEAM-7545] Reordering Beam Joins URL: https://github.com/apache/beam/pull/9040#issuecomment-512548035 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278451) Time Spent: 9h 40m (was: 9.5h) > Row Count Estimation for CSV TextTable > -- > > Key: BEAM-7545 > URL: https://issues.apache.org/jira/browse/BEAM-7545 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Alireza Samadianzakaria >Assignee: Alireza Samadianzakaria >Priority: Major > Fix For: Not applicable > > Time Spent: 9h 40m > Remaining Estimate: 0h > > Implementing Row Count Estimation for CSV Tables by reading the first few > lines of the file and estimating the number of records based on the length of > these lines and the total length of the file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable
[ https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278450 ] ASF GitHub Bot logged work on BEAM-7545: Author: ASF GitHub Bot Created on: 17/Jul/19 20:03 Start Date: 17/Jul/19 20:03 Worklog Time Spent: 10m Work Description: riazela commented on pull request #9040: [BEAM-7545] Reordering Beam Joins URL: https://github.com/apache/beam/pull/9040#discussion_r304617399 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java ## @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rule; + +import java.math.BigInteger; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.Map; +import java.util.function.Function; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamRuleSets; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap; +import org.apache.calcite.DataContext; +import org.apache.calcite.adapter.enumerable.EnumerableConvention; +import org.apache.calcite.adapter.enumerable.EnumerableRules; +import org.apache.calcite.linq4j.Enumerable; +import org.apache.calcite.linq4j.Linq4j; +import org.apache.calcite.plan.ConventionTraitDef; +import org.apache.calcite.plan.RelOptRule; +import org.apache.calcite.plan.RelTraitSet; +import org.apache.calcite.rel.RelCollationTraitDef; +import org.apache.calcite.rel.RelCollations; +import org.apache.calcite.rel.RelFieldCollation; +import org.apache.calcite.rel.RelNode; +import org.apache.calcite.rel.RelRoot; +import org.apache.calcite.rel.core.Join; +import org.apache.calcite.rel.core.TableScan; +import org.apache.calcite.rel.rules.JoinCommuteRule; +import org.apache.calcite.rel.rules.SortProjectTransposeRule; +import org.apache.calcite.rel.type.RelDataType; +import org.apache.calcite.rel.type.RelDataTypeFactory; +import org.apache.calcite.schema.ScannableTable; +import org.apache.calcite.schema.SchemaPlus; +import org.apache.calcite.schema.Statistic; +import org.apache.calcite.schema.Statistics; +import org.apache.calcite.schema.Table; +import org.apache.calcite.schema.impl.AbstractSchema; +import org.apache.calcite.schema.impl.AbstractTable; +import org.apache.calcite.sql.SqlNode; +import org.apache.calcite.sql.parser.SqlParser; +import org.apache.calcite.tools.FrameworkConfig; +import org.apache.calcite.tools.Frameworks; +import org.apache.calcite.tools.Planner; +import org.apache.calcite.tools.Programs; +import org.apache.calcite.tools.RuleSet; +import org.apache.calcite.tools.RuleSets; +import org.apache.calcite.util.ImmutableBitSet; +import org.junit.Assert; +import org.junit.Test; + +/** + * This test ensures that we are reordering joins and get a plan similar to Join(large,Join(small, + * medium)) instead of Join(small, Join(medium,large). + */ +public class JoinReorderingTest { + private final PipelineOptions defaultPipelineOptions = PipelineOptionsFactory.create(); + + @Test + public void testTableSizes() { +TestTableProvider tableProvider = new TestTableProvider(); +createThreeTables(tableProvider); + +Assert.assertEquals( +BigInteger.ONE, +tableProvider +.buildBeamSqlTable(tableProvider.getTable("small_table")) +.getRowCount(null) +.getRowCount()); + +Assert.assertEquals( +BigInteger.valueOf(3), +tableProvider +.buildBeamSqlTable(tableProvider.getTable("medium_table")) +.getRowCount(null) +.getRowCount()); + +
[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables
[ https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278448 ] ASF GitHub Bot logged work on BEAM-7726: Author: ASF GitHub Bot Created on: 17/Jul/19 20:01 Start Date: 17/Jul/19 20:01 Worklog Time Spent: 10m Work Description: lostluck commented on issue #9080: [BEAM-7726] Implement State Backed Iterables in Go SDK URL: https://github.com/apache/beam/pull/9080#issuecomment-512546674 R: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278448) Time Spent: 1h (was: 50m) > [Go SDK] State Backed Iterables > --- > > Key: BEAM-7726 > URL: https://issues.apache.org/jira/browse/BEAM-7726 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: Not applicable >Reporter: Robert Burke >Assignee: Robert Burke >Priority: Major > Fix For: Not applicable > > Time Spent: 1h > Remaining Estimate: 0h > > The Go SDK should support the State backed iterables protocol per the proto. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644] > > Primary case is for iterables after CoGBKs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call
[ https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887387#comment-16887387 ] Udi Meiri edited comment on BEAM-2264 at 7/17/19 8:00 PM: -- And also affects pubsub (under directrunner): https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298 was (Author: udim): And also affects pubsub: https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298 > Re-use credential instead of generating a new one one each GCS call > --- > > Key: BEAM-2264 > URL: https://issues.apache.org/jira/browse/BEAM-2264 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Luke Cwik >Assignee: Udi Meiri >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > We should cache the credential used within a Pipeline and re-use it instead > of generating a new one on each GCS call. When executing (against 2.0.0 RC2): > {code} > python -m apache_beam.examples.wordcount --input > "gs://dataflow-samples/shakespeare/*" --output local_counts > {code} > Note that we seemingly generate a new access token each time instead of when > a refresh is required. > {code} > super(GcsIO, cls).__new__(cls, storage_client)) > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 1 files. Estimation > took 0.286200046539 seconds > INFO:root:Running pipeline with DirectRunner. > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 43 files. Estimation > took 0.205624818802 seconds > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > ... many more times ... > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call
[ https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887387#comment-16887387 ] Udi Meiri commented on BEAM-2264: - And also affects pubsub: https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298 > Re-use credential instead of generating a new one one each GCS call > --- > > Key: BEAM-2264 > URL: https://issues.apache.org/jira/browse/BEAM-2264 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Luke Cwik >Assignee: Udi Meiri >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > We should cache the credential used within a Pipeline and re-use it instead > of generating a new one on each GCS call. When executing (against 2.0.0 RC2): > {code} > python -m apache_beam.examples.wordcount --input > "gs://dataflow-samples/shakespeare/*" --output local_counts > {code} > Note that we seemingly generate a new access token each time instead of when > a refresh is required. > {code} > super(GcsIO, cls).__new__(cls, storage_client)) > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 1 files. Estimation > took 0.286200046539 seconds > INFO:root:Running pipeline with DirectRunner. > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 43 files. Estimation > took 0.205624818802 seconds > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > ... many more times ... > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278442 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 17/Jul/19 19:54 Start Date: 17/Jul/19 19:54 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064#issuecomment-512544455 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278442) Time Spent: 50m (was: 40m) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 50m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava
[ https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278441 ] ASF GitHub Bot logged work on BEAM-6972: Author: ASF GitHub Bot Created on: 17/Jul/19 19:54 Start Date: 17/Jul/19 19:54 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO URL: https://github.com/apache/beam/pull/9064#issuecomment-512544378 Failures in the gradle console log appear to be infrastructural. Have manually confirmed targets. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 278441) Time Spent: 40m (was: 0.5h) > LTS backport: CassandraIO is broken because of use of bad relocation of guava > - > > Key: BEAM-6972 > URL: https://issues.apache.org/jira/browse/BEAM-6972 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0 >Reporter: Arun sethia >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.7.1 > > Time Spent: 40m > Remaining Estimate: 0h > > While using apache beam to run dataflow job to read data from BigQuery and > Store/Write to Cassandra with following libaries: > # beam-sdks-java-io-cassandra - 2.6.0 > # beam-sdks-java-io-jdbc - 2.6.0 > # beam-sdks-java-io-google-cloud-platform - 2.6.0 > # beam-sdks-java-core - 2.6.0 > # google-cloud-dataflow-java-sdk-all - 2.5.0 > # google-api-client -1.25.0 > > I am getting following error at the time insert/save data to Cassandra. > {code:java} > [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture; > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332) > at > org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197) > at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)