[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=385700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385700 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 12/Feb/20 06:46 Start Date: 12/Feb/20 06:46 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585055981 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385700) Time Spent: 40m (was: 0.5h) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385696=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385696 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 12/Feb/20 06:20 Start Date: 12/Feb/20 06:20 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838#issuecomment-585049296 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385696) Time Spent: 8h 10m (was: 8h) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File >
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=385674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385674 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 12/Feb/20 04:39 Start Date: 12/Feb/20 04:39 Worklog Time Spent: 10m Work Description: suztomo commented on pull request #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841#discussion_r378037592 ## File path: sdks/java/build-tools/beam-linkage-check.sh ## @@ -0,0 +1,99 @@ +#!/bin/bash Review comment: @iemejia I tried with zsh compatible shell script, but it's not trivial. Therefore this relies on bash. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385674) Time Spent: 0.5h (was: 20m) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=385671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385671 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 12/Feb/20 04:37 Start Date: 12/Feb/20 04:37 Worklog Time Spent: 10m Work Description: suztomo commented on pull request #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841 Script to compare linkage errors (checkJavaLinkage task in the root gradle project) between PR's branch and master branch. This is a temporary solution before Linkage Checker implements exclusion rules (BEAM-9206). Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=385672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385672 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 12/Feb/20 04:38 Start Date: 12/Feb/20 04:38 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841#issuecomment-585024938 R: @iemejia CC: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385672) Time Spent: 20m (was: 10m) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385661 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 12/Feb/20 03:54 Start Date: 12/Feb/20 03:54 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation URL: https://github.com/apache/beam/pull/10734#discussion_r378028956 ## File path: sdks/python/gen_protos.py ## @@ -216,6 +216,23 @@ def _import(m): sys.path.pop(0) +def _find_protoc_gen_mypy(): Review comment: Yeah, might as well keep it. It includes a fallback when PATH is not set and it’s easier to debug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385661) Time Spent: 9h (was: 8h 50m) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385662 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 12/Feb/20 03:54 Start Date: 12/Feb/20 03:54 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation URL: https://github.com/apache/beam/pull/10734#discussion_r378028966 ## File path: sdks/python/gen_protos.py ## @@ -216,6 +216,23 @@ def _import(m): sys.path.pop(0) +def _find_protoc_gen_mypy(): Review comment: Yeah, might as well keep it. It includes a fallback when PATH is not set and it’s easier to debug. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385662) Time Spent: 9h 10m (was: 9h) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385656 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 12/Feb/20 03:33 Start Date: 12/Feb/20 03:33 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#discussion_r378019506 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self): { 'name': 'bar', 'timestamp': 33 }, Review comment: I'm afraid not. It seems b/146457921 and BEAM-9003 are reported separately. b/146457921 is for Unified Worker and BEAM-9003 is for Dataflow. b/146457921 causes this test to be sickbayed in validates_runner_tests.bzl. BEAM-9003 causes this test to be sickbayed on Jenkins streaming VR test suite. It's possible that BEAM-9003 could be solved at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385656) Time Spent: 50h (was: 49h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 50h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385652 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 12/Feb/20 03:15 Start Date: 12/Feb/20 03:15 Worklog Time Spent: 10m Work Description: udim commented on pull request #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation URL: https://github.com/apache/beam/pull/10734#discussion_r378020687 ## File path: sdks/python/gen_protos.py ## @@ -216,6 +216,23 @@ def _import(m): sys.path.pop(0) +def _find_protoc_gen_mypy(): Review comment: Do you want to keep this code in? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385652) Time Spent: 8h 50m (was: 8h 40m) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385648 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 12/Feb/20 03:09 Start Date: 12/Feb/20 03:09 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#discussion_r378019506 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self): { 'name': 'bar', 'timestamp': 33 }, Review comment: I'm afraid not. It seems b/146457921 and BEAM-9003 are two different issues. b/146457921 is for Unified Worker and BEAM-9003 is for Dataflow. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385648) Time Spent: 49h 50m (was: 49h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 49h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8095) pytest Py3.7 crashes on test_remote_runner_display_data
[ https://issues.apache.org/jira/browse/BEAM-8095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-8095. - Fix Version/s: Not applicable Resolution: Fixed > pytest Py3.7 crashes on test_remote_runner_display_data > --- > > Key: BEAM-8095 > URL: https://issues.apache.org/jira/browse/BEAM-8095 > Project: Beam > Issue Type: Improvement > Components: test-failures, testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > Adding certain flags such as "-n 1" or "--debug" causes Python to abort. > The --debug flag logs some information in a pytestdebug.log, but I'm not > familiar with it and what it means if anything. > {code} > $ pytest > apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data > > === > test session starts > > platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 > rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: > pytest.ini > plugins: forked-1.0.2, xdist-1.29.0 > collected 1 item > > > > > apache_beam/runners/dataflow/dataflow_runner_test.py . > > > > [100%] > > 1 passed in 0.11s > = > $ pytest > apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data > --debug > writing pytestdebug information to > /usr/local/google/home/ehudm/src/beam/sdks/python/pytestdebug.log > === > test session starts > > platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 -- > /usr/local/google/home/ehudm/virtualenvs/beam-py37/bin/python3.7 > using: pytest-5.1.1 pylib-1.8.0 > setuptools registered plugins: > pytest-forked-1.0.2 at > /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/pytest_forked/__init__.py > pytest-xdist-1.29.0 at > /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/plugin.py > pytest-xdist-1.29.0 at > /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/looponfail.py > rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: > pytest.ini > plugins: forked-1.0.2, xdist-1.29.0 > collected 1 item > > > > > apache_beam/runners/dataflow/dataflow_runner_test.py Aborted (core dumped) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385638 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 12/Feb/20 02:45 Start Date: 12/Feb/20 02:45 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838#issuecomment-584992554 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385638) Time Spent: 8h (was: 7h 50m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 8h > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File >
[jira] [Commented] (BEAM-7463) BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: incorrect checksum
[ https://issues.apache.org/jira/browse/BEAM-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034991#comment-17034991 ] Valentyn Tymofieiev commented on BEAM-7463: --- Looks like this is still failing, or failing again: https://builds.apache.org/job/beam_PostCommit_Python37_PR/77/testReport/junit/apache_beam.io.gcp.big_query_query_to_table_it_test/BigQueryQueryToTableIT/test_big_query_new_types_native/ Expected: (Test pipeline expected terminated in state: DONE and Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) but: Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709 > BigQueryQueryToTableIT is flaky on Direct runner in PostCommit suites: > incorrect checksum > -- > > Key: BEAM-7463 > URL: https://issues.apache.org/jira/browse/BEAM-7463 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 6h > Remaining Estimate: 0h > > {noformat} > 15:03:38 FAIL: test_big_query_new_types > (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) > 15:03:38 > -- > 15:03:38 Traceback (most recent call last): > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", > line 211, in test_big_query_new_types > 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", > line 82, in run_bq_pipeline > 15:03:38 result = p.run() > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 15:03:38 else test_runner_api)) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 15:03:38 self._options).run(False) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 15:03:38 return self.runner.run_pipeline(self, self._options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 15:03:38 AssertionError: > 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and > Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) > 15:03:38 but: Expected checksum is > 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is > da39a3ee5e6b4b0d3255bfef95601890afd80709 > {noformat} > [~Juta] could this be caused by changes to Bigquery matcher? > https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 > > cc: [~pabloem] [~chamikara] [~apilloud] > A recent postcommit run has BQ failures in other tests as well: > https://builds.apache.org/job/beam_PostCommit_Python3_Verify/1000/consoleFull -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses
[ https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385635 ] ASF GitHub Bot logged work on BEAM-7284: Author: ASF GitHub Bot Created on: 12/Feb/20 02:31 Start Date: 12/Feb/20 02:31 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10837: [BEAM-7284] Cleanup MappingProxy reducer since dill supports it now. URL: https://github.com/apache/beam/pull/10837#issuecomment-584988523 Thanks for a quick review, @lazylynx ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385635) Time Spent: 1h 40m (was: 1.5h) > Support Py3 Dataclasses > > > Key: BEAM-7284 > URL: https://issues.apache.org/jira/browse/BEAM-7284 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Major > Fix For: 2.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > It looks like dill does not support Dataclasses yet, > https://github.com/uqfoundation/dill/issues/312, which very likely means that > Beam does not support them either. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8095) pytest Py3.7 crashes on test_remote_runner_display_data
[ https://issues.apache.org/jira/browse/BEAM-8095?focusedWorklogId=385634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385634 ] ASF GitHub Bot logged work on BEAM-8095: Author: ASF GitHub Bot Created on: 12/Feb/20 02:26 Start Date: 12/Feb/20 02:26 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10759: [BEAM-8095] Remove no_xdist for test URL: https://github.com/apache/beam/pull/10759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385634) Time Spent: 0.5h (was: 20m) > pytest Py3.7 crashes on test_remote_runner_display_data > --- > > Key: BEAM-8095 > URL: https://issues.apache.org/jira/browse/BEAM-8095 > Project: Beam > Issue Type: Improvement > Components: test-failures, testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Adding certain flags such as "-n 1" or "--debug" causes Python to abort. > The --debug flag logs some information in a pytestdebug.log, but I'm not > familiar with it and what it means if anything. > {code} > $ pytest > apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data > > === > test session starts > > platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 > rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: > pytest.ini > plugins: forked-1.0.2, xdist-1.29.0 > collected 1 item > > > > > apache_beam/runners/dataflow/dataflow_runner_test.py . > > > > [100%] > > 1 passed in 0.11s > = > $ pytest > apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data > --debug > writing pytestdebug information to > /usr/local/google/home/ehudm/src/beam/sdks/python/pytestdebug.log > === > test session starts > > platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 -- > /usr/local/google/home/ehudm/virtualenvs/beam-py37/bin/python3.7 > using: pytest-5.1.1 pylib-1.8.0 > setuptools registered plugins: > pytest-forked-1.0.2 at > /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/pytest_forked/__init__.py > pytest-xdist-1.29.0 at > /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/plugin.py > pytest-xdist-1.29.0 at > /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/looponfail.py > rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: > pytest.ini > plugins: forked-1.0.2, xdist-1.29.0 > collected 1 item > >
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385633 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 12/Feb/20 02:17 Start Date: 12/Feb/20 02:17 Worklog Time Spent: 10m Work Description: stankiewicz commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-584984570 I have one challenge with full e2e test - python version stages full apache-beam package. pip download 2.20dev0 fails. I can try to patch locally 2.19 tag just with this PR or maybe there is other way to test runner? When I change versions to 2.19 then eventually I have error some errors on version clash (probably haven't changed everywhere), but I can see that pipeline.json is uploaded, it contains steps and graph is rendered properly in console. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385633) Time Spent: 50m (was: 40m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses
[ https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385628 ] ASF GitHub Bot logged work on BEAM-7284: Author: ASF GitHub Bot Created on: 12/Feb/20 02:15 Start Date: 12/Feb/20 02:15 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10837: [BEAM-7284] Cleanup MappingProxy reducer since dill supports it now. URL: https://github.com/apache/beam/pull/10837 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385628) Time Spent: 1.5h (was: 1h 20m) > Support Py3 Dataclasses > > > Key: BEAM-7284 > URL: https://issues.apache.org/jira/browse/BEAM-7284 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Major > Fix For: 2.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > It looks like dill does not support Dataclasses yet, > https://github.com/uqfoundation/dill/issues/312, which very likely means that > Beam does not support them either. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385627 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 12/Feb/20 02:14 Start Date: 12/Feb/20 02:14 Worklog Time Spent: 10m Work Description: stankiewicz commented on pull request #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#discussion_r378006760 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -539,6 +541,15 @@ def run_pipeline(self, pipeline, options): # Get a Dataflow API client and set its options self.dataflow_client = apiclient.DataflowApplicationClient(options) +if self.job.options.view_as(GoogleCloudOptions).upload_graph: Review comment: Re hiding it in apiclient - in first commit I've implemented it that way then I've noticed java implementation where it is implemented in runner instead of apiclient and decided to move. Will move again and rewrite tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385627) Time Spent: 40m (was: 0.5h) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385626 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 12/Feb/20 02:13 Start Date: 12/Feb/20 02:13 Worklog Time Spent: 10m Work Description: stankiewicz commented on pull request #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#discussion_r378006495 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -562,6 +562,12 @@ def _add_argparse_args(cls, parser): default=None, choices=['COST_OPTIMIZED', 'SPEED_OPTIMIZED'], help='Set the Flexible Resource Scheduling mode') +parser.add_argument( Review comment: Re option, ok, I will handle it as experiment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385626) Time Spent: 0.5h (was: 20m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=385624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385624 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 12/Feb/20 02:11 Start Date: 12/Feb/20 02:11 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-584983018 @EDjur - This LGTM. I would like to merge it but I cannot seem to trigger the tests @markflyhigh is there a manual way to run the tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385624) Time Spent: 6h 40m (was: 6.5h) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=385623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385623 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 12/Feb/20 02:10 Start Date: 12/Feb/20 02:10 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-584982808 tests, will you trigger? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385623) Time Spent: 6.5h (was: 6h 20m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses
[ https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385620 ] ASF GitHub Bot logged work on BEAM-7284: Author: ASF GitHub Bot Created on: 12/Feb/20 01:58 Start Date: 12/Feb/20 01:58 Worklog Time Spent: 10m Work Description: lazylynx commented on issue #10837: [BEAM-7284] Cleanup MappingProxy reducer since dill supports it now. URL: https://github.com/apache/beam/pull/10837#issuecomment-584979637 @tvalentyn LGTM, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385620) Time Spent: 1h 20m (was: 1h 10m) > Support Py3 Dataclasses > > > Key: BEAM-7284 > URL: https://issues.apache.org/jira/browse/BEAM-7284 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Major > Fix For: 2.15.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > It looks like dill does not support Dataclasses yet, > https://github.com/uqfoundation/dill/issues/312, which very likely means that > Beam does not support them either. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9301) Check in beam-linkage-check.sh
Tomo Suzuki created BEAM-9301: - Summary: Check in beam-linkage-check.sh Key: BEAM-9301 URL: https://issues.apache.org/jira/browse/BEAM-9301 Project: Beam Issue Type: Task Components: build-system Reporter: Tomo Suzuki Assignee: Tomo Suzuki https://github.com/apache/beam/pull/10769#issuecomment-584571787 bq. @suztomo can you contribute this script maybe into Beam's build-tools directory so we can improve it a bit for further use? This is a temporary solution before exclusion rules in Linkage Checker (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8201) clean up the current container API
[ https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=385610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385610 ] ASF GitHub Bot logged work on BEAM-8201: Author: ASF GitHub Bot Created on: 12/Feb/20 01:37 Start Date: 12/Feb/20 01:37 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10839: [BEAM-8201] Add other endpoint fields to provision API. URL: https://github.com/apache/beam/pull/10839 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385595 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 12/Feb/20 00:59 Start Date: 12/Feb/20 00:59 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838#issuecomment-584957126 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385595) Time Spent: 7h 50m (was: 7h 40m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File >
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385594 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 12/Feb/20 00:59 Start Date: 12/Feb/20 00:59 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses
[ https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=385585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385585 ] ASF GitHub Bot logged work on BEAM-7284: Author: ASF GitHub Bot Created on: 12/Feb/20 00:51 Start Date: 12/Feb/20 00:51 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10837: [BEAM-7284] Cleanup MappingProxy reducer since dill supports it now. URL: https://github.com/apache/beam/pull/10837#issuecomment-584952973 R: @lazylynx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385585) Time Spent: 1h 10m (was: 1h) > Support Py3 Dataclasses > > > Key: BEAM-7284 > URL: https://issues.apache.org/jira/browse/BEAM-7284 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Major > Fix For: 2.15.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > It looks like dill does not support Dataclasses yet, > https://github.com/uqfoundation/dill/issues/312, which very likely means that > Beam does not support them either. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=385578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385578 ] ASF GitHub Bot logged work on BEAM-7198: Author: ASF GitHub Bot Created on: 12/Feb/20 00:37 Start Date: 12/Feb/20 00:37 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename ToStringCoder to ToBytesCoder for proper representation of its role URL: https://github.com/apache/beam/pull/10828#issuecomment-584945917 Changes LGTM, we can merge once tests pass. Thanks a lot, @lazylynx ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385578) Time Spent: 50m (was: 40m) > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Labels: easy-fix, starter > Time Spent: 50m > Remaining Estimate: 0h > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder
[ https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=385577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385577 ] ASF GitHub Bot logged work on BEAM-7198: Author: ASF GitHub Bot Created on: 12/Feb/20 00:35 Start Date: 12/Feb/20 00:35 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename ToStringCoder to ToBytesCoder for proper representation of its role URL: https://github.com/apache/beam/pull/10828#issuecomment-584945241 test test test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385577) Time Spent: 40m (was: 0.5h) > Rename ToStringCoder into ToBytesCoder > -- > > Key: BEAM-7198 > URL: https://issues.apache.org/jira/browse/BEAM-7198 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Labels: easy-fix, starter > Time Spent: 40m > Remaining Estimate: 0h > > The name of ToStringCoder class [1] is confusing, since the output of > encode() on Python3 will be bytes. On Python 2 the output is also bytes, > since bytes and string are synonyms on Py2. > ToBytesCoder would be a better name for this class. > Note that this class is not listed in coders that constitute Public APIs [2], > so we can treat this as internal change. As a courtesy to users who happened > to reference a non-public coder in their pipelines we can keep the old class > name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but > clean up Beam codeabase to use the new name. > [1] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344 > [2] > https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20 > cc: [~yoshiki.obata] [~chamikara] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.
[ https://issues.apache.org/jira/browse/BEAM-9290?focusedWorklogId=385573=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385573 ] ASF GitHub Bot logged work on BEAM-9290: Author: ASF GitHub Bot Created on: 12/Feb/20 00:33 Start Date: 12/Feb/20 00:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10827: [BEAM-9290] Support runner_harness_container_image in released python… URL: https://github.com/apache/beam/pull/10827#issuecomment-584943995 Thanks for fixing this. It's unfortunate it wasn't done correctly and tests did not catch it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385573) Time Spent: 50m (was: 40m) > runner_harness_container_image experiment is not honored in python released > sdks. > - > > Key: BEAM-9290 > URL: https://issues.apache.org/jira/browse/BEAM-9290 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > > {code:java} > --experiments=runner_harness_container_image=foo_image{code} > does not have any affect on the job. > > > cc: [~tvalentyn] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385564 ] ASF GitHub Bot logged work on BEAM-9160: Author: ASF GitHub Bot Created on: 12/Feb/20 00:16 Start Date: 12/Feb/20 00:16 Worklog Time Spent: 10m Work Description: ecapoccia commented on pull request #10825: [BEAM-9160] Update AWS SDK to support Pod Level Identity URL: https://github.com/apache/beam/pull/10825#discussion_r377976039 ## File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/options/AwsModule.java ## @@ -122,6 +123,8 @@ public AWSCredentialsProvider deserializeWithType( return new SystemPropertiesCredentialsProvider(); } else if (typeName.equals(ProfileCredentialsProvider.class.getSimpleName())) { return new ProfileCredentialsProvider(); + } else if (typeName.equals(WebIdentityTokenCredentialsProvider.class.getSimpleName())) { +return WebIdentityTokenCredentialsProvider.create(); Review comment: @iemejia @andeb opened PR https://github.com/apache/beam/pull/10836 as discussed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385564) Time Spent: 2h 40m (was: 2.5h) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385559=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385559 ] ASF GitHub Bot logged work on BEAM-9160: Author: ASF GitHub Bot Created on: 12/Feb/20 00:08 Start Date: 12/Feb/20 00:08 Worklog Time Spent: 10m Work Description: ecapoccia commented on pull request #10836: BEAM-9160 URL: https://github.com/apache/beam/pull/10836 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385559) Time Spent: 2.5h (was: 2h 20m) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385556 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 12/Feb/20 00:05 Start Date: 12/Feb/20 00:05 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-584930246 R: @angoenka This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385556) Time Spent: 49h 40m (was: 49.5h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 49h 40m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=38=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-38 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 12/Feb/20 00:05 Start Date: 12/Feb/20 00:05 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#discussion_r377972727 ## File path: sdks/python/apache_beam/transforms/util_test.py ## @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self): { 'name': 'bar', 'timestamp': 33 }, Review comment: Can we un-sickbay this test after this fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 38) Time Spent: 49.5h (was: 49h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 49.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385553 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 12/Feb/20 00:05 Start Date: 12/Feb/20 00:05 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-584929984 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385553) Time Spent: 49h 20m (was: 49h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 49h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9219) Streamline creation of Python and Java dependencies pages
[ https://issues.apache.org/jira/browse/BEAM-9219?focusedWorklogId=385551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385551 ] ASF GitHub Bot logged work on BEAM-9219: Author: ASF GitHub Bot Created on: 12/Feb/20 00:04 Start Date: 12/Feb/20 00:04 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10745: [BEAM-9219] Streamline creation of Python and Java dependencies pages URL: https://github.com/apache/beam/pull/10745 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385551) Time Spent: 11.5h (was: 11h 20m) > Streamline creation of Python and Java dependencies pages > - > > Key: BEAM-9219 > URL: https://issues.apache.org/jira/browse/BEAM-9219 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: David Wrede >Priority: Minor > Time Spent: 11.5h > Remaining Estimate: 0h > > This issue is about the need to address keeping both Python and Java SDK > dependency pages more relevant and up-to-date while reducing the amount of > time it takes to provide that information. The current method of scraping and > copying dependencies into a table for every release is a non-trivial task > because of the semi-automated workflows done by the tech writers on the > website. > In an effort to provide accurate dependency listings that are always in sync > with SDK releases, referring people to the appropriate places in the source > code (or through CLI commands) should provide people the information they are > looking for and not require the creation and maintenance of an automated > tooling solution to generate the dependency tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385528 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 11/Feb/20 23:14 Start Date: 11/Feb/20 23:14 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835 Removed MAX_TIMESTAMP from testing data, because the semantics of these extremum elements are not decided yet. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=385506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385506 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 11/Feb/20 22:47 Start Date: 11/Feb/20 22:47 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #10576: [BEAM-5605] Convert all BoundedSources to SplittableDoFns when using beam_fn_api experiment. URL: https://github.com/apache/beam/pull/10576#discussion_r377945680 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java ## @@ -1,176 +0,0 @@ -/* Review comment: Gotcha. I already approved this, but consider it double-approved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385506) Time Spent: 12h 40m (was: 12.5h) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 12h 40m > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385490 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 22:00 Start Date: 11/Feb/20 22:00 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377925605 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385488 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:59 Start Date: 11/Feb/20 21:59 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377925182 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385489 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:59 Start Date: 11/Feb/20 21:59 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377925387 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test + public void testReadFilesBinaryProtocol() { + +PCollection testThriftDoc = +mainPipeline +.apply(Create.of(THRIFT_DIR + "data").withCoder(StringUtf8Coder.of())) +
[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry
[ https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=385485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385485 ] ASF GitHub Bot logged work on BEAM-9269: Author: ASF GitHub Bot Created on: 11/Feb/20 21:53 Start Date: 11/Feb/20 21:53 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10752: [BEAM-9269] Add commit deadline for Spanner writes. URL: https://github.com/apache/beam/pull/10752#discussion_r377922130 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java ## @@ -136,13 +163,51 @@ public SpannerConfig withHost(ValueProvider host) { return toBuilder().setHost(host).build(); } + public SpannerConfig withCommitDeadline(Duration commitDeadline) { +return withCommitDeadline(ValueProvider.StaticValueProvider.of(commitDeadline)); + } + + public SpannerConfig withCommitDeadline(ValueProvider commitDeadline) { +return toBuilder().setCommitDeadline(commitDeadline).build(); + } + + public SpannerConfig withMaxCumulativeBackoff(Duration maxCumulativeBackoff) { +return withMaxCumulativeBackoff(ValueProvider.StaticValueProvider.of(maxCumulativeBackoff)); + } + + public SpannerConfig withMaxCumulativeBackoff(ValueProvider maxCumulativeBackoff) { +return toBuilder().setMaxCumulativeBackoff(maxCumulativeBackoff).build(); + } + @VisibleForTesting SpannerConfig withServiceFactory(ServiceFactory serviceFactory) { return toBuilder().setServiceFactory(serviceFactory).build(); } public SpannerAccessor connectToSpanner() { SpannerOptions.Builder builder = SpannerOptions.newBuilder(); + +if (getCommitDeadline() != null && getCommitDeadline().get().getMillis() > 0) { + + // In Spanner API version 1.21 or above, we can set the deadline / total Timeout on an API + // call using the following code: + // + // UnaryCallSettings.Builder commitSettings = + // builder.getSpannerStubSettingsBuilder().commitSettings(); + // RetrySettings.Builder commitRetrySettings = commitSettings.getRetrySettings().toBuilder() + // commitSettings.setRetrySettings( + // commitRetrySettings.setTotalTimeout( + // Duration.ofMillis(getCommitDeadlineMillis().get())) + // .build()); + // + // However, at time of this commit, the Spanner API is at only at v1.6.0, where the only + // method to set a deadline is with GRPC Interceptors, so we have to use that... + SpannerInterceptorProvider interceptorProvider = + SpannerInterceptorProvider.createDefault() + .with(new CommitDeadlineSettingInterceptor(getCommitDeadline().get())); Review comment: > Just to confirm, this deadline will not cause Dataflow workitems to fail but just that request will be retried by SpannerIO within the same workitem Correct, it will backoff/retry up to a configurable time limit (default 15 mins per workitem). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385485) Time Spent: 3h 50m (was: 3h 40m) > Set shorter Commit Deadline and handle with backoff/retry > - > > Key: BEAM-9269 > URL: https://issues.apache.org/jira/browse/BEAM-9269 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Major > Labels: google-cloud-spanner > Time Spent: 3h 50m > Remaining Estimate: 0h > > Default commit deadline in Spanner is 1hr, which can lead to a variety of > issues including database overload and session expiry. > Shorter deadline should be set with backoff/retry when deadline expires, so > that the Spanner database does not become overloaded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385484 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:53 Start Date: 11/Feb/20 21:53 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377922006 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { Review comment: Done updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385484) Time Spent: 13h 40m (was: 13.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385482 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:51 Start Date: 11/Feb/20 21:51 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377921210 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Sounds good, I'll remove references to `read()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385482) Time Spent: 13.5h (was: 13h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385479 ] ASF GitHub Bot logged work on BEAM-9160: Author: ASF GitHub Bot Created on: 11/Feb/20 21:46 Start Date: 11/Feb/20 21:46 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10825: [BEAM-9160] Update AWS SDK to support Pod Level Identity URL: https://github.com/apache/beam/pull/10825#issuecomment-584869325 The linkage errors from svm is false positives. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385479) Time Spent: 2h 20m (was: 2h 10m) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385478 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:45 Start Date: 11/Feb/20 21:45 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377918220 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { Review comment: Done, removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385478) Time Spent: 13h 20m (was: 13h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385477 ] ASF GitHub Bot logged work on BEAM-9160: Author: ASF GitHub Bot Created on: 11/Feb/20 21:43 Start Date: 11/Feb/20 21:43 Worklog Time Spent: 10m Work Description: andeb commented on issue #10825: [BEAM-9160] Update AWS SDK to support Pod Level Identity URL: https://github.com/apache/beam/pull/10825#issuecomment-584867846 Thanks for reviewing and merging it, @iemejia! With regards to the linkage errors, they may be false positives considering that there are lines of code to ignore classes from GraalVm? https://github.com/GoogleCloudPlatform/cloud-opensource-java/blob/master/dependencies/src/main/java/com/google/cloud/tools/opensource/classpath/LinkageChecker.java#L208-L213 Also, I thought it made sense to expose `WebIdentityTokenCredentialsProvider` directly to be consistent with other exposed credential providers. Happy to do any cleanup though but it seems that @ecapoccia is already on it! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385477) Time Spent: 2h 10m (was: 2h) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385476 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:41 Start Date: 11/Feb/20 21:41 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377916026 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Done updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385476) Time Spent: 13h 10m (was: 13h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385475 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:38 Start Date: 11/Feb/20 21:38 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377914782 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: +1 to not have a `read()` , less 'useless' code to maintain, other File based IOs only have it for historical reasons and we decided to deprecate `readAll` transforms too to make FileIO.match + read composition more explicit since it cover more cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385475) Time Spent: 13h (was: 12h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385474 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:33 Start Date: 11/Feb/20 21:33 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377912388 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Correct `read()` is not implemented and I will remove references to it unless we think it should be implemented. I think `readFiles()` will cover everything but the simple use case. What are your thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385474) Time Spent: 12h 50m (was: 12h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 12h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385473 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:29 Start Date: 11/Feb/20 21:29 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377910032 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: This reference will be removed. `readFiles()` will take in the class as it is needed for the pipeline to deserialize the data into. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385473) Time Spent: 12h 40m (was: 12.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 12h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=385470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385470 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 11/Feb/20 21:28 Start Date: 11/Feb/20 21:28 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10802: [BEAM-8537] Move wrappers of RestrictionTracker out of iobase URL: https://github.com/apache/beam/pull/10802#issuecomment-584861527 All tests passed. I'm going to merge the PR and work on integrating https://github.com/apache/beam/pull/10375. Thanks, everyone! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385470) Time Spent: 14h 10m (was: 14h) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 14h 10m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=385471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385471 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 11/Feb/20 21:28 Start Date: 11/Feb/20 21:28 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10802: [BEAM-8537] Move wrappers of RestrictionTracker out of iobase URL: https://github.com/apache/beam/pull/10802 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385471) Time Spent: 14h 20m (was: 14h 10m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 14h 20m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385469 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:27 Start Date: 11/Feb/20 21:27 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377909310 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test Review comment: `read` was in the old implementation and I will remove the references to it. I think that `readFiles()` will cover most use cases for this IO.
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385462 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377901030 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385464 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377901859 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385465 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377895151 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385459 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377886065 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/package-info.java ## @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Transforms for reading and writing to Thrift files. */ +package org.apache.beam.sdk.io.thrift; Review comment: Add `@Experimental(Kind.SOURCE_SINK)` at the package level too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385459) Time Spent: 11h 40m (was: 11.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385467 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377903969 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test + public void testReadFilesBinaryProtocol() { + +PCollection testThriftDoc = +mainPipeline +.apply(Create.of(THRIFT_DIR + "data").withCoder(StringUtf8Coder.of())) +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385455 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377905535 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: Hm in that case, I think it's fine to commit the generated file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385455) Time Spent: 11h 10m (was: 11h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385466 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377900552 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385460 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377894156 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { Review comment: remove public This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385460) Time Spent: 11h 50m (was: 11h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385457 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377905535 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: Hm in that case, I think it's fine to commit the generated file - unless you feel up to adding the gradle config : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385457) Time Spent: 11h 20m (was: 11h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385463 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377894855 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Having a read() is not mandatory if the example uses FileIO.match and friends IMO. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385463) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385458 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377885914 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: `@Experimental(Kind.SOURCE_SINK)` to make it consistent with the rest of the code base This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385458) Time Spent: 11.5h (was: 11h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL:
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385461 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377885479 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { Review comment: This is in principle internal, so maybe make it package protected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385461) Time Spent: 11h 50m (was: 11h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9300) parse struct literal in ZetaSQL
[ https://issues.apache.org/jira/browse/BEAM-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9300: -- Status: Open (was: Triage Needed) > parse struct literal in ZetaSQL > --- > > Key: BEAM-9300 > URL: https://issues.apache.org/jira/browse/BEAM-9300 > Project: Beam > Issue Type: New Feature > Components: dsl-sql-zetasql >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > https://github.com/apache/beam/blob/b02a325409d55f1ecb7f9fb6ecc4f60a974c810d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L569 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9300) parse struct literal in ZetaSQL
Kyle Weaver created BEAM-9300: - Summary: parse struct literal in ZetaSQL Key: BEAM-9300 URL: https://issues.apache.org/jira/browse/BEAM-9300 Project: Beam Issue Type: New Feature Components: dsl-sql-zetasql Reporter: Kyle Weaver Assignee: Kyle Weaver https://github.com/apache/beam/blob/b02a325409d55f1ecb7f9fb6ecc4f60a974c810d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L569 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385453 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:14 Start Date: 11/Feb/20 21:14 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377902208 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: It can be generated from the .thrift file that is included. I think we would need to add a thrift compiler to the build.gradle to compile it for testing, thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385453) Time Spent: 11h (was: 10h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry
[ https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=385451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385451 ] ASF GitHub Bot logged work on BEAM-9269: Author: ASF GitHub Bot Created on: 11/Feb/20 21:05 Start Date: 11/Feb/20 21:05 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10752: [BEAM-9269] Add commit deadline for Spanner writes. URL: https://github.com/apache/beam/pull/10752#issuecomment-584851158 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385451) Time Spent: 3.5h (was: 3h 20m) > Set shorter Commit Deadline and handle with backoff/retry > - > > Key: BEAM-9269 > URL: https://issues.apache.org/jira/browse/BEAM-9269 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Major > Labels: google-cloud-spanner > Time Spent: 3.5h > Remaining Estimate: 0h > > Default commit deadline in Spanner is 1hr, which can lead to a variety of > issues including database overload and session expiry. > Shorter deadline should be set with backoff/retry when deadline expires, so > that the Spanner database does not become overloaded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry
[ https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=385452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385452 ] ASF GitHub Bot logged work on BEAM-9269: Author: ASF GitHub Bot Created on: 11/Feb/20 21:05 Start Date: 11/Feb/20 21:05 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10752: [BEAM-9269] Add commit deadline for Spanner writes. URL: https://github.com/apache/beam/pull/10752#issuecomment-584851223 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385452) Time Spent: 3h 40m (was: 3.5h) > Set shorter Commit Deadline and handle with backoff/retry > - > > Key: BEAM-9269 > URL: https://issues.apache.org/jira/browse/BEAM-9269 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Major > Labels: google-cloud-spanner > Time Spent: 3h 40m > Remaining Estimate: 0h > > Default commit deadline in Spanner is 1hr, which can lead to a variety of > issues including database overload and session expiry. > Shorter deadline should be set with backoff/retry when deadline expires, so > that the Spanner database does not become overloaded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385444 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 11/Feb/20 21:00 Start Date: 11/Feb/20 21:00 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-584848915 I added a few comments. Could we also verify that this works as expected on Dataflow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385444) Time Spent: 20m (was: 10m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385442 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 11/Feb/20 20:59 Start Date: 11/Feb/20 20:59 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#discussion_r377894420 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -562,6 +562,12 @@ def _add_argparse_args(cls, parser): default=None, choices=['COST_OPTIMIZED', 'SPEED_OPTIMIZED'], help='Set the Flexible Resource Scheduling mode') +parser.add_argument( Review comment: Looking at the Java implementation (https://github.com/apache/beam/pull/7047), this is an experiment and not a top level option. I believe you can handle this similar to other experiments in internal/apiclient.py This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385442) Remaining Estimate: 0h Time Spent: 10m > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=385441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385441 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 11/Feb/20 20:59 Start Date: 11/Feb/20 20:59 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#discussion_r377894662 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -539,6 +541,15 @@ def run_pipeline(self, pipeline, options): # Get a Dataflow API client and set its options self.dataflow_client = apiclient.DataflowApplicationClient(options) +if self.job.options.view_as(GoogleCloudOptions).upload_graph: Review comment: Similarly, this staging can happen in apiclient.py if the experiment is present. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385441) Remaining Estimate: 0h Time Spent: 10m > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=385436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385436 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 11/Feb/20 20:55 Start Date: 11/Feb/20 20:55 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10576: [BEAM-5605] Convert all BoundedSources to SplittableDoFns when using beam_fn_api experiment. URL: https://github.com/apache/beam/pull/10576#discussion_r377892829 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/JavaReadViaImpulse.java ## @@ -1,176 +0,0 @@ -/* Review comment: Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385436) Time Spent: 12.5h (was: 12h 20m) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 12.5h > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=385434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385434 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 11/Feb/20 20:49 Start Date: 11/Feb/20 20:49 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10576: [BEAM-5605] Convert all BoundedSources to SplittableDoFns when using beam_fn_api experiment. URL: https://github.com/apache/beam/pull/10576#discussion_r37788 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java ## @@ -177,4 +205,128 @@ public void populateDisplayData(DisplayData.Builder builder) { .include("source", source); } } + + /** + * A splittable {@link DoFn} which executes a {@link BoundedSource}. + * + * We model the element as the original source and the restriction as the sub-source. This + * allows us to split the sub-source over and over yet still receive "source" objects as inputs. + */ + static class BoundedSourceAsSDFWrapperFn extends DoFn, T> { +private static final long DEFAULT_DESIRED_BUNDLE_SIZE_BYTES = 64 * (1 << 20); + +@GetInitialRestriction +public BoundedSource initialRestriction(@Element BoundedSource element) { + return element; +} + +@GetSize +public double getSize( +@Restriction BoundedSource restriction, PipelineOptions pipelineOptions) +throws Exception { + return restriction.getEstimatedSizeBytes(pipelineOptions); +} + +@SplitRestriction +public void splitRestriction( +@Restriction BoundedSource restriction, +OutputReceiver> receiver, +PipelineOptions pipelineOptions) +throws Exception { + for (BoundedSource split : + restriction.split(DEFAULT_DESIRED_BUNDLE_SIZE_BYTES, pipelineOptions)) { +receiver.output(split); + } +} + +@NewTracker +public RestrictionTracker, Object[]> restrictionTracker( +@Restriction BoundedSource restriction, PipelineOptions pipelineOptions) { + return new BoundedSourceAsSDFRestrictionTracker<>(restriction, pipelineOptions); +} + +@ProcessElement +public void processElement( +RestrictionTracker, Object[]> tracker, OutputReceiver receiver) +throws IOException { + Object[] out = new Object[1]; + while (tracker.tryClaim(out)) { +receiver.output((T) out[0]); + } +} + +@GetRestrictionCoder +public Coder> restrictionCoder() { + return SerializableCoder.of(new TypeDescriptor>() {}); +} + +/** + * A fake restriction tracker which adapts to the {@link BoundedSource} API. The restriction + * object is used to advance the underlying source and to "return" the current element. + */ +private static class BoundedSourceAsSDFRestrictionTracker Review comment: :clap: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385434) Time Spent: 12h 20m (was: 12h 10m) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 12h 20m > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385431 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 20:37 Start Date: 11/Feb/20 20:37 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377883979 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: +1 to use Thrift native serializaton this will enable to share the data with cross-language pipelines in the future This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385431) Time Spent: 10h 50m (was: 10h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385430 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 11/Feb/20 20:34 Start Date: 11/Feb/20 20:34 Worklog Time Spent: 10m Work Description: nipunn1313 commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation URL: https://github.com/apache/beam/pull/10734#issuecomment-584837962 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385430) Time Spent: 8h 40m (was: 8.5h) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce this error locally. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9160) Update AWS SDK to support Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9160: --- Description: Many organizations have started leveraging pod level identity in Kubernetes. The current version of the AWS SDK packaged with Beam 2.17.0 is out of date and doesn't provide native support to pod level identity access management. It is recommended that we introduce support to access AWS resources such as S3 using pod level identity. Current Version of the AWS Java SDK in Beam: def aws_java_sdk_version = "1.11.519" Proposed AWS Java SDK Version: com.amazonaws aws-java-sdk 1.11.710 was: Many organizations have started leveraging pod level identity in Kubernetes. The current version of the AWS SDK packaged with Bean 2.17.0 is out of date and doesn't provide native support to pod level identity access management. It is recommended that we introduce support to access AWS resources such as S3 using pod level identity. Current Version of the AWS Java SDK in Beam: def aws_java_sdk_version = "1.11.519" Proposed AWS Java SDK Version: com.amazonaws aws-java-sdk 1.11.710 > Update AWS SDK to support Pod Level Identity > > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9160: --- Summary: Update AWS SDK to support Kubernetes Pod Level Identity (was: Update AWS SDK to support Pod Level Identity) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=385428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385428 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 11/Feb/20 20:23 Start Date: 11/Feb/20 20:23 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10802: [BEAM-8537] Move wrappers of RestrictionTracker out of iobase URL: https://github.com/apache/beam/pull/10802#issuecomment-584833264 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385428) Time Spent: 14h (was: 13h 50m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 14h > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034809#comment-17034809 ] Thomas Weise commented on BEAM-9298: [~iemejia] yes, this should be on the mailing list. IMO good to communicate intent to dev@ and user@ and also refer to [https://beam.apache.org/documentation/runners/flink/#version-compatibility] > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=385426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385426 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 11/Feb/20 20:19 Start Date: 11/Feb/20 20:19 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation URL: https://github.com/apache/beam/pull/10734#issuecomment-584831758 > I'd rather wait for an official release if you don't mind Done! Ready to test and hopefully merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385426) Time Spent: 8.5h (was: 8h 20m) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce this error locally. > -- This message was sent by
[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034807#comment-17034807 ] Ismaël Mejía commented on BEAM-9298: [~mxm] [~thw] agree? Worth to discuss in the mailing list IMO > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9298: --- Status: Open (was: Triage Needed) > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=385424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385424 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 11/Feb/20 20:15 Start Date: 11/Feb/20 20:15 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#discussion_r377873324 ## File path: .pre-commit-config.yaml ## @@ -0,0 +1,32 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +repos: + - repo: https://github.com/pre-commit/mirrors-yapf +# this rev is a release tag in the repo above and corresponds with a yapf +# version. make sure this matches the version of yapf in tox.ini. +rev: v0.29.0 +hooks: + - id: yapf +files: ^sdks/python/apache_beam/ +# keep these in sync with sdks/python/.yapfignore Review comment: First, some background: pre-commit triggers its hooks based on changed files, matching based on 3 values: - file type (e.g. it has a mapping from "python" to , ".py", etc) - files: include patttern - exclude: exclude pattern pre-commit always passes an explicit list of changed files to the underlying tool, and never relies on recursive flags. Ok, on to your question. Our exclude patterns cover autogenerated files. So, if a new autogenerated file were added to the repo which was excluded in .yapfignore but not in .pre-commit-config.yaml (which I think would be the most common "failure" scenario), then the first time that a developer with pre-commit enabled changed that file (say, by regenerating it) and tried to commit those changes, yapf would autoformat it, and then pre-commit would fail the commit with a message like this: ``` yapf.Failed - hook id: yapf - files were modified by this hook ``` At this point hopefully the developer realizes that the failure is due to yapf, and finds the trail of comments leading them to .pre-commit-config.yaml. Worst case scenario is that they commit it with the autoformatted changes, which would be pretty innocuous, and would not fail any jenkins tests because those same autogenerated files are excluded from pylint. Another thing to note is that this is all just a convenience for developers. Jenkins remains our last line of defense. Also note that it's possible to invoke your lint tools using pre-commit within tox, so that you can consolidate all of your includes and excludes across all tools into one file, the pre-commit-config.yaml. This is what we do where I work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385424) Time Spent: 1h (was: 50m) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-9284. Resolution: Fixed > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9284 > URL: https://issues.apache.org/jira/browse/BEAM-9284 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > > Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target > and make Flink Runner compatible with Flink 1.10. > There are some incompatible changes on flink-clients as part of their support > for Java 11, so that is an area to be addressed in the Beam side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-9284. Fix Version/s: Not applicable Resolution: Duplicate > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9284 > URL: https://issues.apache.org/jira/browse/BEAM-9284 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > > Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target > and make Flink Runner compatible with Flink 1.10. > There are some incompatible changes on flink-clients as part of their support > for Java 11, so that is an area to be addressed in the Beam side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reopened BEAM-9284: > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9284 > URL: https://issues.apache.org/jira/browse/BEAM-9284 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > > Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target > and make Flink Runner compatible with Flink 1.10. > There are some incompatible changes on flink-clients as part of their support > for Java 11, so that is an area to be addressed in the Beam side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9284) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9284: --- Status: Open (was: Triage Needed) > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9284 > URL: https://issues.apache.org/jira/browse/BEAM-9284 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > > Apache Flink 1.10 will coming and it's better to add Flink 1.10 build target > and make Flink Runner compatible with Flink 1.10. > There are some incompatible changes on flink-clients as part of their support > for Java 11, so that is an area to be addressed in the Beam side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
[ https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9295: --- Status: Open (was: Triage Needed) > Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10 > --- > > Key: BEAM-9295 > URL: https://issues.apache.org/jira/browse/BEAM-9295 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > Apache Flink 1.10 has completed the final release vote, see [1]. So, I would > like to add Flink 1.10 build target and make Flink Runner compatible with > Flink 1.10. > And I appreciate it if you can leave your suggestions or comments! > [1] > https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9299: --- Status: Open (was: Triage Needed) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034796#comment-17034796 ] Ismaël Mejía commented on BEAM-9299: For ref Flink 1.9.2 introduced a bacwards incompatible change FLINK-15844 so we need a workaround or just to avoid the hassle.. > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385422 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 20:03 Start Date: 11/Feb/20 20:03 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377868235 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: fwiw the java thrift classes will use the TCompactProtocol to serialize themselves when being java serialized. Personally I would rather see a coder here that explicitly uses a TProtocol to serialize the object rather than relying on java serialization to do it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385422) Time Spent: 10h 40m (was: 10.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385421 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 11/Feb/20 19:58 Start Date: 11/Feb/20 19:58 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-584822664 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385421) Time Spent: 2h 50m (was: 2h 40m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385416 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 11/Feb/20 19:54 Start Date: 11/Feb/20 19:54 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r377860779 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1087,6 +1087,44 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: raw data bytes. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; +// A URN for artifacts described by HTTP links. +// payload: a string for an artifact HTTP URL +HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"]; +// A URN for artifacts hosted on PYPI. +// artifact_id: a PYPI project name +// version_range: a PYPI compatible version string +// payload: None +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; +// A URN for artifacts hosted on Maven central. +// artifact_id: [maven group id]:[maven artifact id] +// version_range: a Maven compatible version string +// payload: None +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; Review comment: One issue here is that the callee may not know if it is on a shared filesystem with the caller (e.g. when calling the expansion service). And when calling two distinct expansion services, one would like to be able to compare between them. Also, perhaps we should not limit ourselves to local paths here, but any path that can be opened with beam filesystems. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385416) Time Spent: 2.5h (was: 2h 20m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385417 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 11/Feb/20 19:54 Start Date: 11/Feb/20 19:54 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r377862882 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1087,6 +1087,44 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: raw data bytes. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; +// A URN for artifacts described by HTTP links. +// payload: a string for an artifact HTTP URL +HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"]; +// A URN for artifacts hosted on PYPI. +// artifact_id: a PYPI project name +// version_range: a PYPI compatible version string +// payload: None +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; +// A URN for artifacts hosted on Maven central. +// artifact_id: [maven group id]:[maven artifact id] +// version_range: a Maven compatible version string +// payload: None +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + // A generated staged name (no path). + string staged_name = 2; +} + +message ArtifactInformation { + string urn = 1; + bytes payload = 2; + string artifact_id = 3; Review comment: If we can't come up with a standard format, I think it should be part of the payload. Similarly for artifact_id--they should mean the same thing. (We can always safely dedup on urn+payloads.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385417) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
sunjincheng created BEAM-9299: - Summary: Upgrade Flink Runner to 1.8.3 and 1.9.2 Key: BEAM-9299 URL: https://issues.apache.org/jira/browse/BEAM-9299 Project: Beam Issue Type: Task Components: runner-flink Reporter: sunjincheng Assignee: sunjincheng Fix For: 2.20.0 I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. What do you think? [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385415 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 11/Feb/20 19:54 Start Date: 11/Feb/20 19:54 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r377863536 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1087,6 +1087,44 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: raw data bytes. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; +// A URN for artifacts described by HTTP links. +// payload: a string for an artifact HTTP URL +HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"]; +// A URN for artifacts hosted on PYPI. +// artifact_id: a PYPI project name +// version_range: a PYPI compatible version string +// payload: None +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; +// A URN for artifacts hosted on Maven central. +// artifact_id: [maven group id]:[maven artifact id] +// version_range: a Maven compatible version string +// payload: None +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + // A generated staged name (no path). + string staged_name = 2; +} + +message ArtifactInformation { + string urn = 1; + bytes payload = 2; + string artifact_id = 3; + string version_range = 4; +} Review comment: But we'll need it for more than just embedded payload. And the name itself may not be enough to determine the role (do we try to install all .tar.gz files in Python? Or are some just data?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385415) Time Spent: 2.5h (was: 2h 20m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=385414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385414 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 11/Feb/20 19:54 Start Date: 11/Feb/20 19:54 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r377862155 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1087,6 +1087,44 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: raw data bytes. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; +// A URN for artifacts described by HTTP links. +// payload: a string for an artifact HTTP URL +HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"]; +// A URN for artifacts hosted on PYPI. +// artifact_id: a PYPI project name +// version_range: a PYPI compatible version string +// payload: None +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; +// A URN for artifacts hosted on Maven central. +// artifact_id: [maven group id]:[maven artifact id] +// version_range: a Maven compatible version string +// payload: None +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + // A generated staged name (no path). + string staged_name = 2; Review comment: But eventually we'll have to give it a name, right? (One could argue that dependencies should be a map(name -> artifact). OTOH, for some perhaps we could leave it blank and one could be inferred (e.g. for urls). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385414) Time Spent: 2.5h (was: 2h 20m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)