[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=243839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243839 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 17/May/19 05:37 Start Date: 17/May/19 05:37 Worklog Time Spent: 10m Work Description: jbartok commented on issue #8592: [BEAM-7305] Improve and extend Hazelcast Jet based Java Runner URL: https://github.com/apache/beam/pull/8592#issuecomment-493326374 Hi @mxm. Yes, I was pondering it yesterday if I should make this pull request out of multiple change-sets or squash them down to a single one... I might not have made the best choice... The thing is that I'm dumping months of my work into these two change-sets, that's why it looks so non-incremental. The actual development has been done in https://github.com/hazelcast/hazelcast-jet-beam-runner, there are 100+ commits there (debugging has proven simpler if working like this, worth the effort of migrating later). Anyways, from now on pace of development should be slower and I will make it more incremental by issuing more frequent PRs. As far as reviews are concerned, they have been done to some degree on our module by my Hazelcast colleagues. Here we would need somebody both impartial to Hazelcast and with knowledge of Jet, might be not that simple to find. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243839) Time Spent: 2h 40m (was: 2.5h) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7190) enable file system based token authentication for portable runner
[ https://issues.apache.org/jira/browse/BEAM-7190?focusedWorklogId=243815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243815 ] ASF GitHub Bot logged work on BEAM-7190: Author: ASF GitHub Bot Created on: 17/May/19 03:48 Start Date: 17/May/19 03:48 Worklog Time Spent: 10m Work Description: angoenka commented on issue #8597: [BEAM-7190] Enable file based token auth for samza portable runner URL: https://github.com/apache/beam/pull/8597#issuecomment-493309252 - Can we make the token creation and authentication modular and pluggable so that it can be added to other runners as well by setting a pipeline option. - We will also need secure channel to encrypt the content to call it truely secure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243815) Time Spent: 20m (was: 10m) > enable file system based token authentication for portable runner > - > > Key: BEAM-7190 > URL: https://issues.apache.org/jira/browse/BEAM-7190 > Project: Beam > Issue Type: Task > Components: runner-samza >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > For Samza and potentially other portable runners, there is a need to secure > the communication between sdk worker and runner. Currently the SSL/TLS in > portability is half done. > However, after investigation we found that it's sufficient to just 1) use > loopback address 2) enforce authentication and that way the communication is > both authenticated and secured. > This ticket intends to track the implementation of the solution above. More > details can be found in the following PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7190) enable file system based token authentication for portable runner
[ https://issues.apache.org/jira/browse/BEAM-7190?focusedWorklogId=243797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243797 ] ASF GitHub Bot logged work on BEAM-7190: Author: ASF GitHub Bot Created on: 17/May/19 02:56 Start Date: 17/May/19 02:56 Worklog Time Spent: 10m Work Description: lhaiesp commented on pull request #8597: [BEAM-7190] Enable file based token auth for samza portable runner URL: https://github.com/apache/beam/pull/8597 For Samza and potentially other portable runners who do not use docker and need to run on multi-tenant environment, there is a need to secure the communication between sdk worker and runner. Currently the SSL/TLS in portability is half done. However, after investigation we found that it's sufficient to just 1. Use loopback address. So that the traffic is not exposed to external network 2. Enforce authentication. So that only the valid users can connect to the ports. With the two steps above, it won't be necessary to enable TLS. Because the data channel is only local and one needs root privilege to eavesdrop the local traffic. A trivial way to do authentication is to share a secret token through file system (e.g. set the file permission to be 600, i.e. -rw---) . Next we introduce a customized interpreter for both the gRPC client and server to provide and verify this token (see GrpcFileTokenAuthProvider.java and token_auth_interceptor.py). The server can then deny any connection attempts that do not have the right token. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243779 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:25 Start Date: 17/May/19 01:25 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284951313 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -269,6 +272,14 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): for (arg, hint) in zip(argspec.args, typeargs)] packed_typeargs += list(typeargs[len(packed_typeargs):]) + if sys.version_info.major < 3: +return getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, Review comment: https://docs.python.org/3/howto/pyporting.html#use-feature-detection-instead-of-version-detection gives a good guideline on this topic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243779) Time Spent: 7.5h (was: 7h 20m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7347) beam_Performance failed with benchmark flag config error
[ https://issues.apache.org/jira/browse/BEAM-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-7347: --- Description: [All|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/] performance benchmarks are affected. Error log from [latest beam_PerformanceTests_TextIOIT run|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_TextIOIT/2008/console]: {code} 00:00:24.372 2019-05-17 00:21:24,724 5d6e9583 MainThread beam_integration_benchmark(1/1) ERRORError during benchmark beam_integration_benchmark 00:00:24.372 Traceback (most recent call last): 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 752, in RunBenchmark 00:00:24.372 DoProvisionPhase(spec, detailed_timer) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 538, in DoProvisionPhase 00:00:24.372 spec.ConstructDpbService() 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 209, in ConstructDpbService 00:00:24.372 self.dpb_service = dpb_service_class(self.config.dpb_service) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py", line 53, in __init__ 00:00:24.372 super(GcpDpbDataflow, self).__init__(dpb_service_spec) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/dpb_service.py", line 127, in __init__ 00:00:24.372 'The flag dpb_service_zone must be provided, for provisioning.') 00:00:24.372 InvalidFlagConfigurationError: The flag dpb_service_zone must be provided, for provisioning. {code} Seems certain change on [PerfkitBenchmarker|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker] breaks our [beam_integration_benchmark|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py]. However, we may be able to have a quick fix on Beam side. was: [All|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/] performance benchmarks are affected. Error log from [latest beam_PerformanceTests_TextIOIT run|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_TextIOIT/2008/console]: {code} 00:00:24.372 2019-05-17 00:21:24,724 5d6e9583 MainThread beam_integration_benchmark(1/1) ERRORError during benchmark beam_integration_benchmark 00:00:24.372 Traceback (most recent call last): 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 752, in RunBenchmark 00:00:24.372 DoProvisionPhase(spec, detailed_timer) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 538, in DoProvisionPhase 00:00:24.372 spec.ConstructDpbService() 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 209, in ConstructDpbService 00:00:24.372 self.dpb_service = dpb_service_class(self.config.dpb_service) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py", line 53, in __init__ 00:00:24.372 super(GcpDpbDataflow, self).__init__(dpb_service_spec) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/dpb_service.py", line 127, in __init__ 00:00:24.372 'The flag dpb_service_zone must be provided, for provisioning.') 00:00:24.372 InvalidFlagConfigurationError: The flag dpb_service_zone must be provided, for provisioning. {code} Seems certain change on [PerfkitBenchmarker|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker] breaks our [beam_integration_benchmark|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py]. However, we may be able to have a quick fix on our side. > beam_Performance failed with benchmark flag config error > > > Key: BEAM-7347 > URL: https://issues.apache.org/jira/browse/BEAM-7347 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mark Liu >Priority: Major > > [All|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/] > performance
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243774 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:10 Start Date: 17/May/19 01:10 Worklog Time Spent: 10m Work Description: udim commented on issue #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#issuecomment-493282684 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243774) Time Spent: 7h 20m (was: 7h 10m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7347) beam_Performance failed with benchmark flag config error
[ https://issues.apache.org/jira/browse/BEAM-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-7347: --- Description: [All|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/] performance benchmarks are affected. Error log from [latest beam_PerformanceTests_TextIOIT run|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_TextIOIT/2008/console]: {code} 00:00:24.372 2019-05-17 00:21:24,724 5d6e9583 MainThread beam_integration_benchmark(1/1) ERRORError during benchmark beam_integration_benchmark 00:00:24.372 Traceback (most recent call last): 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 752, in RunBenchmark 00:00:24.372 DoProvisionPhase(spec, detailed_timer) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 538, in DoProvisionPhase 00:00:24.372 spec.ConstructDpbService() 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 209, in ConstructDpbService 00:00:24.372 self.dpb_service = dpb_service_class(self.config.dpb_service) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py", line 53, in __init__ 00:00:24.372 super(GcpDpbDataflow, self).__init__(dpb_service_spec) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/dpb_service.py", line 127, in __init__ 00:00:24.372 'The flag dpb_service_zone must be provided, for provisioning.') 00:00:24.372 InvalidFlagConfigurationError: The flag dpb_service_zone must be provided, for provisioning. {code} Seems certain change on [PerfkitBenchmarker|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker] breaks our [beam_integration_benchmark|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py]. However, we may be able to have a quick fix on our side. was: All performance benchmarks are affected. Error log from [latest beam_PerformanceTests_TextIOIT run|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_TextIOIT/2008/console]: {code} 00:00:24.372 2019-05-17 00:21:24,724 5d6e9583 MainThread beam_integration_benchmark(1/1) ERRORError during benchmark beam_integration_benchmark 00:00:24.372 Traceback (most recent call last): 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 752, in RunBenchmark 00:00:24.372 DoProvisionPhase(spec, detailed_timer) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 538, in DoProvisionPhase 00:00:24.372 spec.ConstructDpbService() 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 209, in ConstructDpbService 00:00:24.372 self.dpb_service = dpb_service_class(self.config.dpb_service) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py", line 53, in __init__ 00:00:24.372 super(GcpDpbDataflow, self).__init__(dpb_service_spec) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/dpb_service.py", line 127, in __init__ 00:00:24.372 'The flag dpb_service_zone must be provided, for provisioning.') 00:00:24.372 InvalidFlagConfigurationError: The flag dpb_service_zone must be provided, for provisioning. {code} Seems certain change on [PerfkitBenchmarker|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker] breaks our [beam_integration_benchmark|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py]. However, we may be able to have a quick fix on our side. > beam_Performance failed with benchmark flag config error > > > Key: BEAM-7347 > URL: https://issues.apache.org/jira/browse/BEAM-7347 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mark Liu >Priority: Major > > [All|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/] > performance benchmarks are affected. > Error log from [latest
[jira] [Created] (BEAM-7347) beam_Performance failed with benchmark flag config error
Mark Liu created BEAM-7347: -- Summary: beam_Performance failed with benchmark flag config error Key: BEAM-7347 URL: https://issues.apache.org/jira/browse/BEAM-7347 Project: Beam Issue Type: Bug Components: test-failures Reporter: Mark Liu All performance benchmarks are affected. Error log from [latest beam_PerformanceTests_TextIOIT run|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_TextIOIT/2008/console]: {code} 00:00:24.372 2019-05-17 00:21:24,724 5d6e9583 MainThread beam_integration_benchmark(1/1) ERRORError during benchmark beam_integration_benchmark 00:00:24.372 Traceback (most recent call last): 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 752, in RunBenchmark 00:00:24.372 DoProvisionPhase(spec, detailed_timer) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/pkb.py", line 538, in DoProvisionPhase 00:00:24.372 spec.ConstructDpbService() 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/benchmark_spec.py", line 209, in ConstructDpbService 00:00:24.372 self.dpb_service = dpb_service_class(self.config.dpb_service) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/providers/gcp/gcp_dpb_dataflow.py", line 53, in __init__ 00:00:24.372 super(GcpDpbDataflow, self).__init__(dpb_service_spec) 00:00:24.372 File "/home/jenkins/jenkins-slave/workspace/beam_PerformanceTests_TextIOIT/PerfKitBenchmarker/perfkitbenchmarker/dpb_service.py", line 127, in __init__ 00:00:24.372 'The flag dpb_service_zone must be provided, for provisioning.') 00:00:24.372 InvalidFlagConfigurationError: The flag dpb_service_zone must be provided, for provisioning. {code} Seems certain change on [PerfkitBenchmarker|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker] breaks our [beam_integration_benchmark|https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/master/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py]. However, we may be able to have a quick fix on our side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243773 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:06 Start Date: 17/May/19 01:06 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284948765 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -269,6 +272,14 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): for (arg, hint) in zip(argspec.args, typeargs)] packed_typeargs += list(typeargs[len(packed_typeargs):]) + if sys.version_info.major < 3: +return getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, Review comment: Perhaps it's better to avoid hard coding Python versions; rely on the existence of methods/attributes/etc. or behavior instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243773) Time Spent: 7h 10m (was: 7h) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243767 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:04 Start Date: 17/May/19 01:04 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284926095 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -269,6 +272,14 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): for (arg, hint) in zip(argspec.args, typeargs)] packed_typeargs += list(typeargs[len(packed_typeargs):]) + if sys.version_info.major < 3: +return getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, Review comment: Re: checking version within the code, I don't believe we have a guideline. Use your judgement. In this case, I would check if this could hurt performance. I've merged py2 and py3 union matching into one function and added 3.5.2-specific support (what we use on Jenkins). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243767) Time Spent: 6.5h (was: 6h 20m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243768 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:04 Start Date: 17/May/19 01:04 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284926464 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -314,10 +325,40 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): return callargs +def getcallargs_forhints_impl_py3(func, packed_typeargs, typekwargs): + try: +# TODO(udim): Function signature returned by getfullargspec (in +# packed_typeargs) might differ from the one below. Migrate to use +# inspect.signature in getfullargspec (for Py3). +signature = inspect.signature(func) + except ValueError as e: +logger.warning('Could not get signature for function: %s: %s', func, e) +return {} + try: +bindings = signature.bind(*packed_typeargs, **typekwargs) + except TypeError as e: +# Might be raised due to too few or too many arguments. +raise TypeCheckError(e) + bound_args = bindings.arguments + missing = [] + for param in signature.parameters.values(): +if param.kind == inspect.Parameter.VAR_POSITIONAL: + bound_args[param.name] = typehints.Tuple[typehints.Any, ...] +elif param.kind == inspect.Parameter.VAR_KEYWORD: + bound_args[param.name] = typehints.Dict[typehints.Any, typehints.Any] +elif param.name not in bound_args and param.default is not param.empty: + # Declare unbound parameters with defaults to be Any. + bound_args[param.name] = typehints.Any + + if missing: Review comment: nope, see above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243768) Time Spent: 6h 40m (was: 6.5h) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243766 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:04 Start Date: 17/May/19 01:04 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284926389 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -314,10 +325,40 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): return callargs +def getcallargs_forhints_impl_py3(func, packed_typeargs, typekwargs): + try: +# TODO(udim): Function signature returned by getfullargspec (in +# packed_typeargs) might differ from the one below. Migrate to use +# inspect.signature in getfullargspec (for Py3). +signature = inspect.signature(func) + except ValueError as e: +logger.warning('Could not get signature for function: %s: %s', func, e) +return {} + try: +bindings = signature.bind(*packed_typeargs, **typekwargs) + except TypeError as e: +# Might be raised due to too few or too many arguments. +raise TypeCheckError(e) + bound_args = bindings.arguments + missing = [] Review comment: Good catch! That was a leftover. `signature.bind` should check for missing arguments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243766) Time Spent: 6h 20m (was: 6h 10m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243770 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:04 Start Date: 17/May/19 01:04 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284945686 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -269,6 +272,14 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): for (arg, hint) in zip(argspec.args, typeargs)] packed_typeargs += list(typeargs[len(packed_typeargs):]) + if sys.version_info.major < 3: +return getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, + typekwargs) + else: +return getcallargs_forhints_impl_py3(func, packed_typeargs, typekwargs) + + +def getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, typekwargs): Review comment: Removed it from almost everywhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243770) Time Spent: 7h (was: 6h 50m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243769 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 17/May/19 01:04 Start Date: 17/May/19 01:04 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284946031 ## File path: sdks/python/apache_beam/typehints/native_type_compatibility.py ## @@ -144,11 +154,17 @@ def convert_to_beam_type(typ): match=_match_issubclass(typing.Tuple), arity=-1, beam_type=typehints.Tuple), - _TypeMapEntry( - match=_match_same_type(typing.Union), - arity=-1, - beam_type=typehints.Union) ] + if sys.version_info.major >= 3: +type_map.append( +_TypeMapEntry( +match=_match_is_union_py3, arity=-1, beam_type=typehints.Union)) Review comment: I made a py2and3 `_match_is_union` function. There are some differences between it and #8453. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243769) Time Spent: 6h 50m (was: 6h 40m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7135) Spark executable stage: Job bundle factory is not being closed
[ https://issues.apache.org/jira/browse/BEAM-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841836#comment-16841836 ] Kyle Weaver commented on BEAM-7135: --- It also doesn't help that I'm logging every exception that's going to be printed out later anyway. [https://github.com/apache/beam/blob/8821ed8c3f6b5f4d16abf98d17910cc4a9ba8720/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L144] > Spark executable stage: Job bundle factory is not being closed > -- > > Key: BEAM-7135 > URL: https://issues.apache.org/jira/browse/BEAM-7135 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > JobBundleFactory is being created, but never closed: > [https://github.com/apache/beam/blob/a91516cd10d382d1c8a42f3e3b373fbad46369f6/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L111] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7135) Spark executable stage: Job bundle factory is not being closed
[ https://issues.apache.org/jira/browse/BEAM-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841828#comment-16841828 ] Kyle Weaver commented on BEAM-7135: --- This issue creates a huge number of errors that often end up overwriting whatever logs preceded them. 2019-05-16 17:32:19,506 [grpc-default-executor-30] ERROR org.apache.beam.vendor.grpc.v1p13p1.io.grpc.internal.ManagedChannelOrphanWrapper - *~*~*~ Channel ManagedChannelImpl\{logId=11064, target=directaddress:///org.apache.beam.vendor.grpc.v1p13p1.io.grpc.inprocess.InProcessSocketAddress@3d10bf0b} was not shutdown properly!!! ~*~*~* > Spark executable stage: Job bundle factory is not being closed > -- > > Key: BEAM-7135 > URL: https://issues.apache.org/jira/browse/BEAM-7135 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > JobBundleFactory is being created, but never closed: > [https://github.com/apache/beam/blob/a91516cd10d382d1c8a42f3e3b373fbad46369f6/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L111] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks
[ https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=243762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243762 ] ASF GitHub Bot logged work on BEAM-6908: Author: ASF GitHub Bot Created on: 17/May/19 00:31 Start Date: 17/May/19 00:31 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #8518: [BEAM-6908] Refactor Python performance test groovy file for easy configuration URL: https://github.com/apache/beam/pull/8518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243762) Time Spent: 15h (was: 14h 50m) > Add Python3 performance benchmarks > -- > > Key: BEAM-6908 > URL: https://issues.apache.org/jira/browse/BEAM-6908 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > > Similar to > [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/], > we want to have a Python3 benchmark running on Jenkins to detect performance > regression during code adoption. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks
[ https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=243760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243760 ] ASF GitHub Bot logged work on BEAM-6908: Author: ASF GitHub Bot Created on: 17/May/19 00:30 Start Date: 17/May/19 00:30 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #8518: [BEAM-6908] Refactor Python performance test groovy file for easy configuration URL: https://github.com/apache/beam/pull/8518#issuecomment-493276171 @tvalentyn I can add link of Bigquery table and dashboard in Beam doc. Also fixed the comment for `beam_prebuilt`. Synced with @manisha252 offline and got approved for this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243760) Time Spent: 14h 50m (was: 14h 40m) > Add Python3 performance benchmarks > -- > > Key: BEAM-6908 > URL: https://issues.apache.org/jira/browse/BEAM-6908 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > > Similar to > [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/], > we want to have a Python3 benchmark running on Jenkins to detect performance > regression during code adoption. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7271) Adding StringUtf8Coder to ModelCoder in JavaSDK [REOPENED]
[ https://issues.apache.org/jira/browse/BEAM-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841827#comment-16841827 ] Heejong Lee commented on BEAM-7271: --- Yes. I think we still need to backport [https://github.com/apache/beam/pull/8575]. The commit related to `adding StringUtf8Coder to ModelCoder` on 2.13.0 branch is incomplete. The branch only has BEAM-7008 but not BEAM-7260 and some additional fix in BEAM-7271. > Adding StringUtf8Coder to ModelCoder in JavaSDK [REOPENED] > -- > > Key: BEAM-7271 > URL: https://issues.apache.org/jira/browse/BEAM-7271 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Reopend for the reverted previous commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243747 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 17/May/19 00:10 Start Date: 17/May/19 00:10 Worklog Time Spent: 10m Work Description: yifanzou commented on pull request #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243747) Time Spent: 1h 10m (was: 1h) > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7326) Document that Beam BigQuery IO expects users to pass base64-encoded bytes, and BQ IO serves base64-encoded bytes to the user.
[ https://issues.apache.org/jira/browse/BEAM-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841822#comment-16841822 ] Robert Burke commented on BEAM-7326: The BigQuery Go package (not Beam's IO) doesn't mention base64 at all. I believe that it handles that by itself usually, and treats them as opaque blobs. In particular, it's handled by the JSON encoding of the values, which automatically base64 encodes bytes. See [https://godoc.org/cloud.google.com/go/bigquery] and [https://godoc.org/encoding/json#Marshal] In other words, in Go, its a BiqQuery implementation detail that is hidden from users, unless they configure things to change it. > Document that Beam BigQuery IO expects users to pass base64-encoded bytes, > and BQ IO serves base64-encoded bytes to the user. > - > > Key: BEAM-7326 > URL: https://issues.apache.org/jira/browse/BEAM-7326 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp >Reporter: Valentyn Tymofieiev >Priority: Major > > BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache > Beam BigQuery IO connector. > Current implementation of BigQuery connector in Java and Python SDKs expects > that users base64-encode bytes before passing them to BigQuery IO, see > discussion on dev: [1] > This needs to be reflected in public documentation, see [2-4] > cc: [~juta] [~chamikara] [~pabloem] > cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be > done for Go SDK and/or Beam SQL. > [1] > https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E > [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/ > [3] > https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html > [4] > https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks
[ https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=243746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243746 ] ASF GitHub Bot logged work on BEAM-6908: Author: ASF GitHub Bot Created on: 17/May/19 00:05 Start Date: 17/May/19 00:05 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #8518: [BEAM-6908] Refactor Python performance test groovy file for easy configuration URL: https://github.com/apache/beam/pull/8518#issuecomment-493271799 Run Python35 WordCountIT Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243746) Time Spent: 14h 40m (was: 14.5h) > Add Python3 performance benchmarks > -- > > Key: BEAM-6908 > URL: https://issues.apache.org/jira/browse/BEAM-6908 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 14h 40m > Remaining Estimate: 0h > > Similar to > [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/], > we want to have a Python3 benchmark running on Jenkins to detect performance > regression during code adoption. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-7342) Extend SyntheticPipeline map steps to be able to be splittable (Beam Python SDK)
[ https://issues.apache.org/jira/browse/BEAM-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lara Schmidt reassigned BEAM-7342: -- Assignee: Lara Schmidt > Extend SyntheticPipeline map steps to be able to be splittable (Beam Python > SDK) > > > Key: BEAM-7342 > URL: https://issues.apache.org/jira/browse/BEAM-7342 > Project: Beam > Issue Type: New Feature > Components: testing > Environment: Beam Python >Reporter: Lara Schmidt >Assignee: Lara Schmidt >Priority: Minor > Original Estimate: 1m > Remaining Estimate: 1m > > Add the ability for map steps to be configured to be splittable. > Possible configuration options: > - uneven bundle sizes > - possible incorrect sizing returned -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243727 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 23:35 Start Date: 16/May/19 23:35 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493266079 run python precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243727) Time Spent: 11h (was: 10h 50m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (BEAM-7190) enable file system based token authentication for portable runner
[ https://issues.apache.org/jira/browse/BEAM-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-7190 started by Hai Lu. > enable file system based token authentication for portable runner > - > > Key: BEAM-7190 > URL: https://issues.apache.org/jira/browse/BEAM-7190 > Project: Beam > Issue Type: Task > Components: runner-samza >Reporter: Hai Lu >Assignee: Hai Lu >Priority: Major > > For Samza and potentially other portable runners, there is a need to secure > the communication between sdk worker and runner. Currently the SSL/TLS in > portability is half done. > However, after investigation we found that it's sufficient to just 1) use > loopback address 2) enforce authentication and that way the communication is > both authenticated and secured. > This ticket intends to track the implementation of the solution above. More > details can be found in the following PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7346) Add tests for BigQuery connector in Go SDK that will exercise all types supported by BQ.
[ https://issues.apache.org/jira/browse/BEAM-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7346: -- Priority: Minor (was: Major) > Add tests for BigQuery connector in Go SDK that will exercise all types > supported by BQ. > > > Key: BEAM-7346 > URL: https://issues.apache.org/jira/browse/BEAM-7346 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Valentyn Tymofieiev >Priority: Minor > > Sample tests in Python and Java SDKs: > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py > In particular, we should make sure BYTES datatype is treated in the same way > on go SDK as in Java and Python SDK. Currently, Java and Python SDK assume > that users pass base64-encoded bytes, but we may decide to revise this > behavior, see [1,2]. > [1] > https://lists.apache.org/thread.html/0c2178cf8e5d9e77c4f233f05a0b87b6011a1daa1a5ae47b41463af5@%3Cdev.beam.apache.org%3E, > > [2] https://issues.apache.org/jira/browse/BEAM-7344 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7346) Add tests for BigQuery connector in Go SDK that will exercise all types supported by BQ.
Valentyn Tymofieiev created BEAM-7346: - Summary: Add tests for BigQuery connector in Go SDK that will exercise all types supported by BQ. Key: BEAM-7346 URL: https://issues.apache.org/jira/browse/BEAM-7346 Project: Beam Issue Type: Bug Components: sdk-go Reporter: Valentyn Tymofieiev Sample tests in Python and Java SDKs: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py In particular, we should make sure BYTES datatype is treated in the same way on go SDK as in Java and Python SDK. Currently, Java and Python SDK assume that users pass base64-encoded bytes, but we may decide to revise this behavior, see [1,2]. [1] https://lists.apache.org/thread.html/0c2178cf8e5d9e77c4f233f05a0b87b6011a1daa1a5ae47b41463af5@%3Cdev.beam.apache.org%3E, [2] https://issues.apache.org/jira/browse/BEAM-7344 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243726 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 23:34 Start Date: 16/May/19 23:34 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493266079 run python precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243726) Time Spent: 10h 50m (was: 10h 40m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6959) Run Go SDK Post Commit tests against the Flink Runner.
[ https://issues.apache.org/jira/browse/BEAM-6959?focusedWorklogId=243725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243725 ] ASF GitHub Bot logged work on BEAM-6959: Author: ASF GitHub Bot Created on: 16/May/19 23:28 Start Date: 16/May/19 23:28 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #8531: [BEAM-6959] Add Flink tests for Go SDK URL: https://github.com/apache/beam/pull/8531 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243725) Time Spent: 1h 50m (was: 1h 40m) > Run Go SDK Post Commit tests against the Flink Runner. > --- > > Key: BEAM-6959 > URL: https://issues.apache.org/jira/browse/BEAM-6959 > Project: Beam > Issue Type: Sub-task > Components: runner-flink, sdk-go, testing >Reporter: Robert Burke >Assignee: Kyle Weaver >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > See parent task BEAM-6958 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243722=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243722 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 23:20 Start Date: 16/May/19 23:20 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284925302 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -105,6 +107,7 @@ def foo((a, b)): 'TypeCheckError', ] +logger = logging.getLogger(__name__) Review comment: It uses the module name instead of "root". See https://issues.apache.org/jira/browse/BEAM-3523 for details. Rereading that JIRA however, I realize that the worker expects us to log to root so I'll revert this for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243722) Time Spent: 6h 10m (was: 6h) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7116) Remove KV from Schema transforms
[ https://issues.apache.org/jira/browse/BEAM-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841798#comment-16841798 ] Brian Hulette commented on BEAM-7116: - Ah shoot. My intention was just to make it so that we could lookup a Schema for KV [here|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Convert.java#L124], I didn't realize that would also make KV use SchemaCoder over KVCoder. > Remove KV from Schema transforms > > > Key: BEAM-7116 > URL: https://issues.apache.org/jira/browse/BEAM-7116 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Priority: Major > > Instead of returning KV objects, we should return a Schema with two fields. > The Convert transform should be able to convert these to KV objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7345) Add support for generics in schema inference
Brian Hulette created BEAM-7345: --- Summary: Add support for generics in schema inference Key: BEAM-7345 URL: https://issues.apache.org/jira/browse/BEAM-7345 Project: Beam Issue Type: Sub-task Components: sdk-java-core Reporter: Brian Hulette Currently schema inference doesn't work for getters that return a parameterized type. Fixing this would most likely involve plumbing TypeDescriptor through FieldValueTypeSupplier, FieldValueTypeInformation, StaticSchemaInference, etc.. rather than Class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7345) Add support for generics in schema inference
[ https://issues.apache.org/jira/browse/BEAM-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841797#comment-16841797 ] Brian Hulette commented on BEAM-7345: - Some more discussion here: https://issues.apache.org/jira/browse/BEAM-7116?focusedCommentId=16841702=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16841702 > Add support for generics in schema inference > > > Key: BEAM-7345 > URL: https://issues.apache.org/jira/browse/BEAM-7345 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Brian Hulette >Priority: Major > > Currently schema inference doesn't work for getters that return a > parameterized type. Fixing this would most likely involve plumbing > TypeDescriptor through FieldValueTypeSupplier, FieldValueTypeInformation, > StaticSchemaInference, etc.. rather than Class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243715 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 23:00 Start Date: 16/May/19 23:00 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493259550 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243715) Time Spent: 10h 40m (was: 10.5h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243714 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 23:00 Start Date: 16/May/19 23:00 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493259550 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243714) Time Spent: 10.5h (was: 10h 20m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7344) Consider removing the requirement that users need to base64-encode their BYTES before passing them to BQ IO connector.
Valentyn Tymofieiev created BEAM-7344: - Summary: Consider removing the requirement that users need to base64-encode their BYTES before passing them to BQ IO connector. Key: BEAM-7344 URL: https://issues.apache.org/jira/browse/BEAM-7344 Project: Beam Issue Type: Bug Components: io-java-gcp, io-python-gcp Reporter: Valentyn Tymofieiev Currently, when BigQuery IO connector reads or stores BYTES datatype in BigQuery, there is an expectation that users base64-encode the bytes before passing them to the connector, see [1,2]. This may be an extra overhead for users to do base64 encoding when interacting with Beam, that is possible to avoid. Filing this issue to reconsider this behavior. cc: [~chamikara]. [1] https://lists.apache.org/thread.html/0c2178cf8e5d9e77c4f233f05a0b87b6011a1daa1a5ae47b41463af5@%3Cdev.beam.apache.org%3E [2] https://issues.apache.org/jira/browse/BEAM-7326 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7344) Consider removing the requirement that users need to base64-encode their BYTES before passing them to BQ IO connector.
[ https://issues.apache.org/jira/browse/BEAM-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7344: -- Priority: Minor (was: Major) > Consider removing the requirement that users need to base64-encode their > BYTES before passing them to BQ IO connector. > -- > > Key: BEAM-7344 > URL: https://issues.apache.org/jira/browse/BEAM-7344 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp >Reporter: Valentyn Tymofieiev >Priority: Minor > > Currently, when BigQuery IO connector reads or stores BYTES datatype in > BigQuery, there is an expectation that users base64-encode the bytes before > passing them to the connector, see [1,2]. > This may be an extra overhead for users to do base64 encoding when > interacting with Beam, that is possible to avoid. Filing this issue to > reconsider this behavior. > cc: [~chamikara]. > [1] > https://lists.apache.org/thread.html/0c2178cf8e5d9e77c4f233f05a0b87b6011a1daa1a5ae47b41463af5@%3Cdev.beam.apache.org%3E > [2] https://issues.apache.org/jira/browse/BEAM-7326 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243709 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 22:45 Start Date: 16/May/19 22:45 Worklog Time Spent: 10m Work Description: udim commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284925302 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -105,6 +107,7 @@ def foo((a, b)): 'TypeCheckError', ] +logger = logging.getLogger(__name__) Review comment: It uses the module name instead of "root". See https://issues.apache.org/jira/browse/BEAM-3523 for details. Rereading that however, I realize that the worker expects us to log to root so I'll revert this for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243709) Time Spent: 6h (was: 5h 50m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243706 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 22:42 Start Date: 16/May/19 22:42 Worklog Time Spent: 10m Work Description: udim commented on issue #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#issuecomment-493256130 @NikeNano Valentyn was referring the precommit test's failure: https://builds.apache.org/job/beam_PreCommit_Python_Commit/6421/ As for the postcommit test failure I see this line in the console log (viewing the full log): ``` test_big_query_legacy_sql (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) ... FAIL ``` Right now my guess is that the postcommit failure is a flake, and I'll try to run it against once I address the review comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243706) Time Spent: 5h 50m (was: 5h 40m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7326) Document that Beam BigQuery IO expects users to pass base64-encoded bytes, and BQ IO serves base64-encoded bytes to the user.
[ https://issues.apache.org/jira/browse/BEAM-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7326: -- Description: BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache Beam BigQuery IO connector. Current implementation of BigQuery connector in Java and Python SDKs expects that users base64-encode bytes before passing them to BigQuery IO, see discussion on dev: [1] This needs to be reflected in public documentation, see [2-4] cc: [~juta] [~chamikara] [~pabloem] cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be done for Go SDK and/or Beam SQL. [1] https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/ [3] https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html [4] https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html was: BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache Beam BigQuery IO connector. Current implementation of BigQuery connector in Java and Python SDKs expects that users base64-encode bytes before passing them to BigQuery IO, see discussion on dev: [1] This needs to be reflected in public documentation, see [2-4] cc: [~juta] [~chamikara] [~pabloem] cc: [~rebo] [~kedin] FYI and to advise whether similar action needs to be done for Go SDK and/or Beam SQL. [1] https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/ [3] https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html [4] https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html > Document that Beam BigQuery IO expects users to pass base64-encoded bytes, > and BQ IO serves base64-encoded bytes to the user. > - > > Key: BEAM-7326 > URL: https://issues.apache.org/jira/browse/BEAM-7326 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, io-python-gcp >Reporter: Valentyn Tymofieiev >Priority: Major > > BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache > Beam BigQuery IO connector. > Current implementation of BigQuery connector in Java and Python SDKs expects > that users base64-encode bytes before passing them to BigQuery IO, see > discussion on dev: [1] > This needs to be reflected in public documentation, see [2-4] > cc: [~juta] [~chamikara] [~pabloem] > cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be > done for Go SDK and/or Beam SQL. > [1] > https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E > [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/ > [3] > https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html > [4] > https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7271) Adding StringUtf8Coder to ModelCoder in JavaSDK [REOPENED]
[ https://issues.apache.org/jira/browse/BEAM-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841780#comment-16841780 ] Ankur Goenka commented on BEAM-7271: Flink PortableValidatesRunner test cases are passing on 2.13.0 [https://github.com/apache/beam/pull/8579] Do we still want to back port this? > Adding StringUtf8Coder to ModelCoder in JavaSDK [REOPENED] > -- > > Key: BEAM-7271 > URL: https://issues.apache.org/jira/browse/BEAM-7271 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Reopend for the reverted previous commit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6908) Add Python3 performance benchmarks
[ https://issues.apache.org/jira/browse/BEAM-6908?focusedWorklogId=243704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243704 ] ASF GitHub Bot logged work on BEAM-6908: Author: ASF GitHub Bot Created on: 16/May/19 22:36 Start Date: 16/May/19 22:36 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #8518: [BEAM-6908] Refactor Python performance test groovy file for easy configuration URL: https://github.com/apache/beam/pull/8518#issuecomment-493255048 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243704) Time Spent: 14.5h (was: 14h 20m) > Add Python3 performance benchmarks > -- > > Key: BEAM-6908 > URL: https://issues.apache.org/jira/browse/BEAM-6908 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 14.5h > Remaining Estimate: 0h > > Similar to > [beam_PerformanceTests_Python|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_Python/], > we want to have a Python3 benchmark running on Jenkins to detect performance > regression during code adoption. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7116) Remove KV from Schema transforms
[ https://issues.apache.org/jira/browse/BEAM-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841777#comment-16841777 ] Reuven Lax commented on BEAM-7116: -- The problem is that Beam special cases KvCoder all over the place, so if we cause KV to use SchemaCoder we will break large parts of Beam. I think it will be easier to just remove KV from our interface and let any two-field schema translate to KV. However what you suggested is indeed a problem in Schema type inference - we don't do a good job with generic classes (someone trying AutoValueSchema hit this). Do you want to file a JIRA for this issue, as there doesn't appear to be one? *From: *Brian Hulette (JIRA) *Date: *Thu, May 16, 2019 at 1:38 PM *To: * > Remove KV from Schema transforms > > > Key: BEAM-7116 > URL: https://issues.apache.org/jira/browse/BEAM-7116 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Priority: Major > > Instead of returning KV objects, we should return a Schema with two fields. > The Convert transform should be able to convert these to KV objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7343) Fix Google Cloud Dataflow Runner * tests on 2.13.0
Ankur Goenka created BEAM-7343: -- Summary: Fix Google Cloud Dataflow Runner * tests on 2.13.0 Key: BEAM-7343 URL: https://issues.apache.org/jira/browse/BEAM-7343 Project: Beam Issue Type: Bug Components: testing Reporter: Ankur Goenka Assignee: Ankur Goenka Fix For: 2.13.0 One of the failing test https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_PR/81/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7342) Extend SyntheticPipeline map steps to be able to be splittable (Beam Python SDK)
[ https://issues.apache.org/jira/browse/BEAM-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lara Schmidt updated BEAM-7342: --- Status: Open (was: Triage Needed) > Extend SyntheticPipeline map steps to be able to be splittable (Beam Python > SDK) > > > Key: BEAM-7342 > URL: https://issues.apache.org/jira/browse/BEAM-7342 > Project: Beam > Issue Type: New Feature > Components: testing > Environment: Beam Python >Reporter: Lara Schmidt >Priority: Minor > Original Estimate: 1m > Remaining Estimate: 1m > > Add the ability for map steps to be configured to be splittable. > Possible configuration options: > - uneven bundle sizes > - possible incorrect sizing returned -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7342) Extend SyntheticPipeline map steps to be able to be splittable (Beam Python SDK)
Lara Schmidt created BEAM-7342: -- Summary: Extend SyntheticPipeline map steps to be able to be splittable (Beam Python SDK) Key: BEAM-7342 URL: https://issues.apache.org/jira/browse/BEAM-7342 Project: Beam Issue Type: New Feature Components: testing Environment: Beam Python Reporter: Lara Schmidt Add the ability for map steps to be configured to be splittable. Possible configuration options: - uneven bundle sizes - possible incorrect sizing returned -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7177) Spark portable runner fails testGlobalCombineWithDefaultsAndTriggers
[ https://issues.apache.org/jira/browse/BEAM-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-7177. --- Resolution: Duplicate Fix Version/s: Not applicable > Spark portable runner fails testGlobalCombineWithDefaultsAndTriggers > > > Key: BEAM-7177 > URL: https://issues.apache.org/jira/browse/BEAM-7177 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > Fix For: Not applicable > > > [https://github.com/apache/beam/blob/1892c97aba6fc5d8342341cba8abff51477f5456/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1185-L1210] > Expected: a collection containing "2: true" > but: mismatches were: [was "2: false"] > Meaning c.pane().isLast() is supposed to be true, but is actually false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-7282) Spark portable runner doesn't support `pre_optimize=all`
[ https://issues.apache.org/jira/browse/BEAM-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver closed BEAM-7282. - Resolution: Fixed Fix Version/s: 2.14.0 > Spark portable runner doesn't support `pre_optimize=all` > > > Key: BEAM-7282 > URL: https://issues.apache.org/jira/browse/BEAM-7282 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Because we are trying to fuse the already-optimized pipeline. > Error message: https://gist.github.com/ibzib/c432b45b90f7ddb62eb39e1784b55ba8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7341) portable Spark: testGlobalCombineWithDefaultsAndTriggers fails
Kyle Weaver created BEAM-7341: - Summary: portable Spark: testGlobalCombineWithDefaultsAndTriggers fails Key: BEAM-7341 URL: https://issues.apache.org/jira/browse/BEAM-7341 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver PaneInfo for CombineTest.testGlobalCombineWithDefaultsAndTriggers [1] output is incorrect. isLast: expected true, is false timing: expected UNKNOWN, is EARLY No idea yet why this is happening, but commenting out the special GBK translation for non-merging windows [2] seems to fix it. [1] [https://github.com/apache/beam/blob/8403313ea7d63e49974629136c615e379ea874ce/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1219-L1242] [2] [https://github.com/apache/beam/blob/e98a3a69295afbfc6984fe92c52125929daf6088/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java#L165-L170] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-563) DoFn Reuse: Update DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-563: - Fix Version/s: (was: Not applicable) 2.14.0 > DoFn Reuse: Update DirectRunner > --- > > Key: BEAM-563 > URL: https://issues.apache.org/jira/browse/BEAM-563 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Major > Fix For: 2.14.0 > > > https://issues.apache.org/jira/browse/BEAM-562 will add setup and teardown > methods to DoFns. Update DirectRunner to add support for these new methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-562) DoFn Reuse: Add new methods to DoFn
[ https://issues.apache.org/jira/browse/BEAM-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-562: - Fix Version/s: (was: Not applicable) 2.14.0 > DoFn Reuse: Add new methods to DoFn > --- > > Key: BEAM-562 > URL: https://issues.apache.org/jira/browse/BEAM-562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Yifan Mai >Priority: Major > Labels: sdk-consistency > Fix For: 2.14.0 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Java SDK added setup and teardown methods to the DoFns. This makes DoFns > reusable and provide performance improvements. Python SDK should add support > for these new DoFn methods: > Proposal doc: > https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f# -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243674 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 16/May/19 21:53 Start Date: 16/May/19 21:53 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596#issuecomment-493244963 Thank you @yifanzou. I updated comments as well as fixed the pylint error that cause PreCommit failed. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243674) Time Spent: 1h (was: 50m) > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243673 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 16/May/19 21:52 Start Date: 16/May/19 21:52 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596#discussion_r284912678 ## File path: sdks/python/apache_beam/examples/wordcount_it_test.py ## @@ -39,7 +39,8 @@ class WordCountIT(unittest.TestCase): _multiprocess_can_split_ = True # The default checksum is a SHA-1 hash generated from a sorted list of - # lines read from expected output. + # lines read from expected output. This value coresponds to the default Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243673) Time Spent: 50m (was: 40m) > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243666 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 21:35 Start Date: 16/May/19 21:35 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284907859 ## File path: sdks/python/apache_beam/io/external/generate_sequence_test.py ## @@ -0,0 +1,64 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Unit tests for cross-language generate sequence.""" + +from __future__ import absolute_import +from __future__ import print_function + +import logging +import os +import re +import unittest + +from nose.plugins.attrib import attr + +from apache_beam.io.external.generate_sequence import GenerateSequence +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to + + +@attr('UsesCrossLanguageTransforms') +class XlangGenerateSequenceTest(unittest.TestCase): + def test_generate_sequence(self): +test_pipeline = TestPipeline() +port = os.environ.get('EXPANSION_PORT') Review comment: We don't need to stage the expansion service jar here since `GenerateSequence` doesn't depend on extra dependencies. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243666) Time Spent: 10h 20m (was: 10h 10m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6690) Spark Translator - ASSIGN_WINDOWS_TRANSFORM_URN
[ https://issues.apache.org/jira/browse/BEAM-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-6690. --- Resolution: Won't Fix Fix Version/s: Not applicable > Spark Translator - ASSIGN_WINDOWS_TRANSFORM_URN > --- > > Key: BEAM-6690 > URL: https://issues.apache.org/jira/browse/BEAM-6690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Ankur Goenka >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6690) Spark Translator - ASSIGN_WINDOWS_TRANSFORM_URN
[ https://issues.apache.org/jira/browse/BEAM-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841735#comment-16841735 ] Kyle Weaver commented on BEAM-6690: --- `translateAssignWindows` was removed from the Flink portable runner [1], so it's probably safe to say we won't need this for Spark. [1] [https://github.com/apache/beam/pull/8058] > Spark Translator - ASSIGN_WINDOWS_TRANSFORM_URN > --- > > Key: BEAM-6690 > URL: https://issues.apache.org/jira/browse/BEAM-6690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Ankur Goenka >Assignee: Kyle Weaver >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243643 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 21:18 Start Date: 16/May/19 21:18 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493235414 run xvr_flink postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243643) Time Spent: 10h 10m (was: 10h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243642 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 21:18 Start Date: 16/May/19 21:18 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493235414 run xvr_flink postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243642) Time Spent: 10h (was: 9h 50m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes
[ https://issues.apache.org/jira/browse/BEAM-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841716#comment-16841716 ] niklas Hansson commented on BEAM-6877: -- Sadly not :(. Should I release it? Plan to work with it on Sunday. > TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode > changes > > > Key: BEAM-6877 > URL: https://issues.apache.org/jira/browse/BEAM-6877 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > > Type inference doesn't work on Python 3.6 due to [bytecode to wordcode > changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes]. > Type inference always returns Any on Python 3.6, so this is not critical. > Affected tests are: > *transforms.ptransform_test*: > - test_combine_properly_pipeline_type_checks_using_decorator > - test_mean_globally_pipeline_checking_satisfied > - test_mean_globally_runtime_checking_satisfied > - test_count_globally_pipeline_type_checking_satisfied > - test_count_globally_runtime_type_checking_satisfied > - test_pardo_type_inference > - test_pipeline_inference > - test_inferred_bad_kv_type > *typehints.trivial_inference_test*: > - all tests in TrivialInferenceTest > *io.gcp.pubsub_test.TestReadFromPubSubOverride*: > * test_expand_with_other_options > * test_expand_with_subscription > * test_expand_with_topic -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243624 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 20:52 Start Date: 16/May/19 20:52 Worklog Time Spent: 10m Work Description: NikeNano commented on issue #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#issuecomment-493224388 > looks like test_convert_to_beam_type[s] test are currently failing on Py3.5. How do you see this? I have check the logs and as far as I can see it only say: "sdks:python:test-suites:direct:py35:postCommitIT" for Python SDK PostCommit Tests on Python 3 in the console output. Trying to debug why it fails. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243624) Time Spent: 5h 40m (was: 5.5h) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243623 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 20:51 Start Date: 16/May/19 20:51 Worklog Time Spent: 10m Work Description: NikeNano commented on issue #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#issuecomment-493224388 > looks like test_convert_to_beam_type[s] test are currently failing on Py3.5. How do you see this? I have check the logs and as far as I can see it only say: "sdks:python:test-suites:direct:py35:postCommitIT" for Python SDK PostCommit Tests on Python 3 in the console output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243623) Time Spent: 5.5h (was: 5h 20m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243616 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 20:41 Start Date: 16/May/19 20:41 Worklog Time Spent: 10m Work Description: NikeNano commented on issue #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#issuecomment-493224388 > looks like test_convert_to_beam_type[s] test are currently failing on Py3.5. How do you see this? I have check the logs and as far as I can see it only say: "sdks:python:test-suites:direct:py35:postCommitIT" for Python SDK PostCommit Tests on Python 3. In the console output. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243616) Time Spent: 5h 20m (was: 5h 10m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7116) Remove KV from Schema transforms
[ https://issues.apache.org/jira/browse/BEAM-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841702#comment-16841702 ] Brian Hulette commented on BEAM-7116: - [~reuvenlax] - Could we just use the existing SchemaRegistry/SchemaProvider architecture to add support for KVs to Convert? I'm still wrapping my head around all the schema inference code, but it seems like if we modify FieldValueTypeSupplier to accept a TypeDescriptor rather than just a Class, and plumb that through FieldValueTypeInformation and StaticSchemaInference we could add support for generic classes, including KV. > Remove KV from Schema transforms > > > Key: BEAM-7116 > URL: https://issues.apache.org/jira/browse/BEAM-7116 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Priority: Major > > Instead of returning KV objects, we should return a Schema with two fields. > The Convert transform should be able to convert these to KV objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (BEAM-7338) Deprecate PoolableDataSourceProvider from JdbcIO
[ https://issues.apache.org/jira/browse/BEAM-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-7338: --- Comment: was deleted (was: Another reason to do so is that it leaks DBCP into the JDBCIO user classpath and this disallows him from using older or future versions of the library withoug conlicts.) > Deprecate PoolableDataSourceProvider from JdbcIO > > > Key: BEAM-7338 > URL: https://issues.apache.org/jira/browse/BEAM-7338 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > `PoolableDataSourceProvider` was introduced as a facility to create a > `PoolableDataSource` from a `ConnectionConfiguration` in JdbcIO. > However the current implementation default parameters cannot cover all cases, > and tweaking the right parameters of the pool is not trivial without exposing > too many knobs in the API, so given that we have a generic way to do this via > `withDataSourceProviderFn` we could deprecate and remove this in the future, > and probably add its use as an example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243611 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 16/May/19 20:08 Start Date: 16/May/19 20:08 Worklog Time Spent: 10m Work Description: yifanzou commented on issue #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596#issuecomment-493213893 Sorry, I didn't read the description carefully. That answers my motivation question. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243611) Time Spent: 40m (was: 0.5h) > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243609 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 16/May/19 20:05 Start Date: 16/May/19 20:05 Worklog Time Spent: 10m Work Description: yifanzou commented on pull request #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596#discussion_r284871200 ## File path: sdks/python/apache_beam/examples/wordcount_it_test.py ## @@ -39,7 +39,8 @@ class WordCountIT(unittest.TestCase): _multiprocess_can_split_ = True # The default checksum is a SHA-1 hash generated from a sorted list of - # lines read from expected output. + # lines read from expected output. This value coresponds to the default Review comment: Typo - corresponds This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243609) Time Spent: 0.5h (was: 20m) > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841684#comment-16841684 ] Ahmet Altay commented on BEAM-7339: --- Related to the quota issue, we need to have a solution to this, otherwise we cannot build large benchmarks which is needed. Either we can increase the quota, or agree that apache-beam-testing project is not good for benchmarks and find an alternative solution. Output verification problem could be simplified IMO. Perhaps we can use gcloud tool itself to calculate output hashes. Thanks for summarizing the issues. > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7338) Deprecate PoolableDataSourceProvider from JdbcIO
[ https://issues.apache.org/jira/browse/BEAM-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841680#comment-16841680 ] Ismaël Mejía commented on BEAM-7338: Another reason to do so is that it leaks DBCP into the JDBCIO user classpath and this disallows him from using older or future versions of the library withoug conlicts. > Deprecate PoolableDataSourceProvider from JdbcIO > > > Key: BEAM-7338 > URL: https://issues.apache.org/jira/browse/BEAM-7338 > Project: Beam > Issue Type: Improvement > Components: io-java-jdbc >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > `PoolableDataSourceProvider` was introduced as a facility to create a > `PoolableDataSource` from a `ConnectionConfiguration` in JdbcIO. > However the current implementation default parameters cannot cover all cases, > and tweaking the right parameters of the pool is not trivial without exposing > too many knobs in the API, so given that we have a generic way to do this via > `withDataSourceProviderFn` we could deprecate and remove this in the future, > and probably add its use as an example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243604 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:33 Start Date: 16/May/19 19:33 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284864864 ## File path: sdks/python/apache_beam/transforms/combiners.py ## @@ -129,10 +129,13 @@ class PerElement(ptransform.PTransform): def expand(self, pcoll): paired_with_void_type = KV[pcoll.element_type, Any] - return (pcoll - | ('%s:PairWithVoid' % self.label >> core.Map(lambda x: (x, None)) - .with_output_types(paired_with_void_type)) - | core.CombinePerKey(CountCombineFn())) + output_type = KV[pcoll.element_type, int] Review comment: The output type needs to be as narrow as possible in order to avoid python pickled coder. The test was failed because of it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243604) Time Spent: 9h 50m (was: 9h 40m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841674#comment-16841674 ] Mark Liu commented on BEAM-7339: Two concerns for >100Gb input: 1. The resource we have for apache-beam-testing project. We have seen exceeding quota in postcommit jobs like cpu and disk. So we should limit number of workers in those performance tests. On the other hand, I don't know how long does it take to process 100Gb with certain number of workers. 2. Output verification could be hard. Large output may not be fit into Jenkins machine so may need special way to verify output correctness. > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243599 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 19:21 Start Date: 16/May/19 19:21 Worklog Time Spent: 10m Work Description: NikeNano commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284860176 ## File path: sdks/python/apache_beam/typehints/native_type_compatibility.py ## @@ -144,11 +154,17 @@ def convert_to_beam_type(typ): match=_match_issubclass(typing.Tuple), arity=-1, beam_type=typehints.Tuple), - _TypeMapEntry( - match=_match_same_type(typing.Union), - arity=-1, - beam_type=typehints.Union) ] + if sys.version_info.major >= 3: +type_map.append( +_TypeMapEntry( +match=_match_is_union_py3, arity=-1, beam_type=typehints.Union)) Review comment: , there are some updates to the same functions, functionality in [BEAM-6985]. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243599) Time Spent: 5h 10m (was: 5h) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243598 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 19:21 Start Date: 16/May/19 19:21 Worklog Time Spent: 10m Work Description: NikeNano commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284859030 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -314,10 +325,40 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): return callargs +def getcallargs_forhints_impl_py3(func, packed_typeargs, typekwargs): + try: +# TODO(udim): Function signature returned by getfullargspec (in +# packed_typeargs) might differ from the one below. Migrate to use +# inspect.signature in getfullargspec (for Py3). +signature = inspect.signature(func) + except ValueError as e: +logger.warning('Could not get signature for function: %s: %s', func, e) +return {} + try: +bindings = signature.bind(*packed_typeargs, **typekwargs) + except TypeError as e: +# Might be raised due to too few or too many arguments. +raise TypeCheckError(e) + bound_args = bindings.arguments + missing = [] + for param in signature.parameters.values(): +if param.kind == inspect.Parameter.VAR_POSITIONAL: + bound_args[param.name] = typehints.Tuple[typehints.Any, ...] +elif param.kind == inspect.Parameter.VAR_KEYWORD: + bound_args[param.name] = typehints.Dict[typehints.Any, typehints.Any] +elif param.name not in bound_args and param.default is not param.empty: + # Declare unbound parameters with defaults to be Any. + bound_args[param.name] = typehints.Any + + if missing: Review comment: Will this ever be conditioned to false with the current code? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243598) Time Spent: 5h (was: 4h 50m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6985) TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6985?focusedWorklogId=243601=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243601 ] ASF GitHub Bot logged work on BEAM-6985: Author: ASF GitHub Bot Created on: 16/May/19 19:23 Start Date: 16/May/19 19:23 Worklog Time Spent: 10m Work Description: NikeNano commented on issue #8453: [BEAM-6985] TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ Updates URL: https://github.com/apache/beam/pull/8453#issuecomment-493199476 > Thanks, @NikeNano. #8590 expands on this change. Would you be ok to keep the discussion on what needs to happen in that PR? We probably don't need two changes. Sure! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243601) Time Spent: 5.5h (was: 5h 20m) > TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ > > > Key: BEAM-6985 > URL: https://issues.apache.org/jira/browse/BEAM-6985 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > The following tests are failing: > * test_convert_nested_to_beam_type > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > > * test_convert_to_beam_type > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > > * test_convert_to_beam_types > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > With similar errors, where `typing. != `. eg: > {noformat} > FAIL: test_convert_to_beam_type > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/native_type_compatibility_test.py", > line 79, in test_convert_to_beam_type > beam_type, description) > AssertionError: typing.Dict[bytes, int] != Dict[bytes, int] : simple dict > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243597 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:20 Start Date: 16/May/19 19:20 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284860203 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1625,6 +1645,110 @@ class BeamModulePlugin implements Plugin { /** ***/ +// Method to create the crossLanguageValidatesRunnerTask. +// The method takes crossLanguageValidatesRunnerConfiguration as parameter. +project.ext.createCrossLanguageValidatesRunnerTask = { Review comment: added the postcommit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243597) Time Spent: 9h 40m (was: 9.5h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=243594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243594 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 16/May/19 19:20 Start Date: 16/May/19 19:20 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8592: [BEAM-7305] Improve and extend Hazelcast Jet based Java Runner URL: https://github.com/apache/beam/pull/8592#discussion_r284856263 ## File path: runners/jet-experimental/src/main/java/org/apache/beam/runners/jet/JetTransformTranslators.java ## @@ -79,7 +76,6 @@ TRANSLATORS.put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenTranslator()); TRANSLATORS.put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowTranslator()); TRANSLATORS.put(PTransformTranslation.IMPULSE_TRANSFORM_URN, new ImpulseTranslator()); -TRANSLATORS.put(PTransformTranslation.TEST_STREAM_TRANSFORM_URN, new TestStreamTranslator()); Review comment: Curious, why did you remove this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243594) Time Spent: 2h 20m (was: 2h 10m) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=243596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243596 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 16/May/19 19:20 Start Date: 16/May/19 19:20 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8592: [BEAM-7305] Improve and extend Hazelcast Jet based Java Runner URL: https://github.com/apache/beam/pull/8592#discussion_r284856811 ## File path: runners/jet-experimental/src/main/java/org/apache/beam/runners/jet/Utils.java ## @@ -246,4 +259,34 @@ static boolean usesStateOrTimers(AppliedPTransform appliedTransform) { return WindowedValue.FullWindowedValueCoder.of( ListCoder.of(elementCoder.getValueCoder()), elementCoder.getWindowCoder()); } + + /** A wrapper of {@code byte[]} that can be used as a hash-map key. */ + public static class ByteArrayKey { +private final byte[] value; +private int hash; + +public ByteArrayKey(@Nonnull byte[] value) { + this.value = value; +} + +@Override +public boolean equals(Object o) { + if (this == o) { +return true; + } + if (o == null || getClass() != o.getClass()) { +return false; + } + ByteArrayKey that = (ByteArrayKey) o; + return Arrays.equals(value, that.value); +} + +@Override +public int hashCode() { + if (hash == 0) { Review comment: Make `hash` an `Integer` and check for null here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243596) Time Spent: 2.5h (was: 2h 20m) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7305) Add first version of Hazelcast Jet Runner
[ https://issues.apache.org/jira/browse/BEAM-7305?focusedWorklogId=243595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243595 ] ASF GitHub Bot logged work on BEAM-7305: Author: ASF GitHub Bot Created on: 16/May/19 19:20 Start Date: 16/May/19 19:20 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8592: [BEAM-7305] Improve and extend Hazelcast Jet based Java Runner URL: https://github.com/apache/beam/pull/8592#discussion_r284858529 ## File path: runners/jet-experimental/src/main/java/org/apache/beam/runners/jet/JetTransformTranslators.java ## @@ -79,7 +76,6 @@ TRANSLATORS.put(PTransformTranslation.FLATTEN_TRANSFORM_URN, new FlattenTranslator()); TRANSLATORS.put(PTransformTranslation.ASSIGN_WINDOWS_TRANSFORM_URN, new WindowTranslator()); TRANSLATORS.put(PTransformTranslation.IMPULSE_TRANSFORM_URN, new ImpulseTranslator()); -TRANSLATORS.put(PTransformTranslation.TEST_STREAM_TRANSFORM_URN, new TestStreamTranslator()); Review comment: Ah, see that you moved it to the test runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243595) Time Spent: 2.5h (was: 2h 20m) > Add first version of Hazelcast Jet Runner > - > > Key: BEAM-7305 > URL: https://issues.apache.org/jira/browse/BEAM-7305 > Project: Beam > Issue Type: New Feature > Components: runner-jet >Reporter: Maximilian Michels >Assignee: Jozsef Bartok >Priority: Major > Fix For: 2.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-562) DoFn Reuse: Add new methods to DoFn
[ https://issues.apache.org/jira/browse/BEAM-562?focusedWorklogId=243592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243592 ] ASF GitHub Bot logged work on BEAM-562: --- Author: ASF GitHub Bot Created on: 16/May/19 19:12 Start Date: 16/May/19 19:12 Worklog Time Spent: 10m Work Description: yifanmai commented on issue #7994: [BEAM-562] Add DoFn.setup and DoFn.teardown to Python SDK URL: https://github.com/apache/beam/pull/7994#issuecomment-493195732 Thanks @aaltay, @kennknowles and @charlesccychen for your help! I added https://issues.apache.org/jira/browse/BEAM-7340 to track the issue related to metrics in DoFn.teardown, as discussed earlier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243592) Time Spent: 10h 50m (was: 10h 40m) > DoFn Reuse: Add new methods to DoFn > --- > > Key: BEAM-562 > URL: https://issues.apache.org/jira/browse/BEAM-562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Yifan Mai >Priority: Major > Labels: sdk-consistency > Fix For: Not applicable > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Java SDK added setup and teardown methods to the DoFns. This makes DoFns > reusable and provide performance improvements. Python SDK should add support > for these new DoFn methods: > Proposal doc: > https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f# -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243590 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 19:10 Start Date: 16/May/19 19:10 Worklog Time Spent: 10m Work Description: NikeNano commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284856682 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -269,6 +272,14 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): for (arg, hint) in zip(argspec.args, typeargs)] packed_typeargs += list(typeargs[len(packed_typeargs):]) + if sys.version_info.major < 3: +return getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, Review comment: Is it accepted behaviour to check the python version within the code? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243590) Time Spent: 4.5h (was: 4h 20m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243591 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 19:11 Start Date: 16/May/19 19:11 Worklog Time Spent: 10m Work Description: NikeNano commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284856682 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -269,6 +272,14 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): for (arg, hint) in zip(argspec.args, typeargs)] packed_typeargs += list(typeargs[len(packed_typeargs):]) + if sys.version_info.major < 3: +return getcallargs_forhints_impl_py2(func, argspec, packed_typeargs, Review comment: Is it accepted behaviour to check the python version within the code? Don't see a problem with it but asking to learn :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243591) Time Spent: 4h 40m (was: 4.5h) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7340) DoFn.teardown metrics are lost in Python SDK
Yifan Mai created BEAM-7340: --- Summary: DoFn.teardown metrics are lost in Python SDK Key: BEAM-7340 URL: https://issues.apache.org/jira/browse/BEAM-7340 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: Yifan Mai If user code in DoFn.shutdown updates custom user metrics, those updates will not get registered e.g. counter increments are not registered. Context: In [FnApiRunner.run_stages|https://github.com/apache/beam/blob/4629e82512ef1606c78cf28a2d66402c3533e23f/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L342-L364], DoFn.teardown is called in worker_handler_manager.close_all, but this is called outside of the FnApiRunner.run_stage calls, so no metrics / monitoring info is retrieved there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6988) TypeHints Py3 Error: test_non_function (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6988?focusedWorklogId=243593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243593 ] ASF GitHub Bot logged work on BEAM-6988: Author: ASF GitHub Bot Created on: 16/May/19 19:16 Start Date: 16/May/19 19:16 Worklog Time Spent: 10m Work Description: NikeNano commented on pull request #8590: [BEAM-6988] Implement a Python 3 version of getcallargs_forhints URL: https://github.com/apache/beam/pull/8590#discussion_r284858804 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -314,10 +325,40 @@ def getcallargs_forhints(func, *typeargs, **typekwargs): return callargs +def getcallargs_forhints_impl_py3(func, packed_typeargs, typekwargs): + try: +# TODO(udim): Function signature returned by getfullargspec (in +# packed_typeargs) might differ from the one below. Migrate to use +# inspect.signature in getfullargspec (for Py3). +signature = inspect.signature(func) + except ValueError as e: +logger.warning('Could not get signature for function: %s: %s', func, e) +return {} + try: +bindings = signature.bind(*packed_typeargs, **typekwargs) + except TypeError as e: +# Might be raised due to too few or too many arguments. +raise TypeCheckError(e) + bound_args = bindings.arguments + missing = [] Review comment: Is missing ever used except for in the if statement? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243593) Time Spent: 4h 50m (was: 4h 40m) > TypeHints Py3 Error: test_non_function > (apache_beam.typehints.typed_pipeline_test.MainInputTest) Fails on Python 3.7+ > - > > Key: BEAM-6988 > URL: https://issues.apache.org/jira/browse/BEAM-6988 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > {noformat} > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/typed_pipeline_test.py", > line 53, in test_non_function > result = ['xa', 'bbx', 'xcx'] | beam.Map(str.strip, 'x') > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 510, in _ror_ > result = p.apply(self, pvalueish, label) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py", > line 514, in apply > transform.type_check_inputs(pvalueish) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/transforms/ptransform.py", > line 753, in type_check_inputs > hints = getcallargs_forhints(argspec_fn, *type_hints[0], **type_hints[1]) > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/decorators.py", > line 283, in getcallargs_forhints > raise TypeCheckError(e) > apache_beam.typehints.decorators.TypeCheckError: strip() missing 1 required > positional argument: 'chars'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243587 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:07 Start Date: 16/May/19 19:07 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493194159 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243587) Time Spent: 9h 20m (was: 9h 10m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 9h 20m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243588 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:07 Start Date: 16/May/19 19:07 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493194159 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243588) Time Spent: 9.5h (was: 9h 20m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6985) TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+
[ https://issues.apache.org/jira/browse/BEAM-6985?focusedWorklogId=243586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243586 ] ASF GitHub Bot logged work on BEAM-6985: Author: ASF GitHub Bot Created on: 16/May/19 19:06 Start Date: 16/May/19 19:06 Worklog Time Spent: 10m Work Description: NikeNano commented on issue #8453: [BEAM-6985] TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ Updates URL: https://github.com/apache/beam/pull/8453#issuecomment-493194050 PTLA @tvalentyn I have added a test for the ordering to make sure the behaviour is the same for python2 vs python3. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243586) Time Spent: 5h 20m (was: 5h 10m) > TypeHints Py3 Error: Native type compatibility tests fail on Python 3.7+ > > > Key: BEAM-6985 > URL: https://issues.apache.org/jira/browse/BEAM-6985 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: niklas Hansson >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > The following tests are failing: > * test_convert_nested_to_beam_type > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > > * test_convert_to_beam_type > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > > * test_convert_to_beam_types > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > With similar errors, where `typing. != `. eg: > {noformat} > FAIL: test_convert_to_beam_type > (apache_beam.typehints.native_type_compatibility_test.NativeTypeCompatibilityTest) > -- > Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/typehints/native_type_compatibility_test.py", > line 79, in test_convert_to_beam_type > beam_type, description) > AssertionError: typing.Dict[bytes, int] != Dict[bytes, int] : simple dict > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243585 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:05 Start Date: 16/May/19 19:05 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284854976 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExternalTest.java ## @@ -86,26 +84,27 @@ public static void tearDown() { @Test @Category({ValidatesRunner.class, UsesCrossLanguageTransforms.class}) public void expandSingleTest() { -PCollection col = +PCollection col = testPipeline -.apply(Create.of(1, 2, 3)) +.apply(Create.of("1", "2", "3")) Review comment: The test is modified as close to Python external_test. It was just adding 1 but now concatenating `Simple(%s)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243585) Time Spent: 9h 10m (was: 9h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-563) DoFn Reuse: Update DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai closed BEAM-563. -- Resolution: Done Fix Version/s: Not applicable This was also done in https://github.com/apache/beam/pull/7994 > DoFn Reuse: Update DirectRunner > --- > > Key: BEAM-563 > URL: https://issues.apache.org/jira/browse/BEAM-563 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Major > Fix For: Not applicable > > > https://issues.apache.org/jira/browse/BEAM-562 will add setup and teardown > methods to DoFns. Update DirectRunner to add support for these new methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243584 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:03 Start Date: 16/May/19 19:03 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284854091 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -197,6 +198,19 @@ def _get_worker_count(pipeline_options): return 12 +def _load_avro_coder(pipeline_options): + experiments = pipeline_options.view_as(DebugOptions).experiments + + experiments = experiments if experiments else [] + + for experiment in experiments: +# There should only be 1 match so returning from the loop +if re.match(r'xlang_test', experiment): Review comment: same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243584) Time Spent: 9h (was: 8h 50m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7230) Using JdbcIO creates huge amount of connections
[ https://issues.apache.org/jira/browse/BEAM-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841646#comment-16841646 ] Brachi Packter commented on BEAM-7230: -- Hi. Looked into the code, seems that like when using API https://github.com/apache/beam/blob/adb6d0c9f790c9cda363dd5d14f03fb11362f4d1/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L298 you are setting data source via: https://github.com/apache/beam/blob/adb6d0c9f790c9cda363dd5d14f03fb11362f4d1/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L361 and then it is not static, and created per function... > Using JdbcIO creates huge amount of connections > --- > > Key: BEAM-7230 > URL: https://issues.apache.org/jira/browse/BEAM-7230 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.11.0 >Reporter: Brachi Packter >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I want to write form DataFlow to GCP cloud SQL, I'm using connection pool, > and still I see huge amount of connections in GCP SQL (4k while I set > connection pool to 300), and most of them in sleep. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243582 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:02 Start Date: 16/May/19 19:02 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284853783 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -136,6 +136,7 @@ def main(unused_argv): service_descriptor = endpoints_pb2.ApiServiceDescriptor() text_format.Merge(os.environ['CONTROL_API_SERVICE_DESCRIPTOR'], service_descriptor) +_load_avro_coder(sdk_pipeline_options) Review comment: Thanks for pointing this out. Will fix this (this is from old design when I thought that Avro coder is only for testing xlang). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243582) Time Spent: 8h 40m (was: 8.5h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243583 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 19:02 Start Date: 16/May/19 19:02 Worklog Time Spent: 10m Work Description: ihji commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284853870 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1625,6 +1645,110 @@ class BeamModulePlugin implements Plugin { /** ***/ +// Method to create the crossLanguageValidatesRunnerTask. +// The method takes crossLanguageValidatesRunnerConfiguration as parameter. +project.ext.createCrossLanguageValidatesRunnerTask = { + def config = it ? it as CrossLanguageValidatesRunnerConfiguration : new CrossLanguageValidatesRunnerConfiguration() + + project.evaluationDependsOn(":sdks:python") + project.evaluationDependsOn(":sdks:java:testing:expansion-service") + project.evaluationDependsOn(":runners:core-construction-java") + + // Task for launching expansion services + def envDir = project.project(":sdks:python").envdir + def pythonDir = project.project(":sdks:python").projectDir + def javaPort = startingExpansionPortNumber.getAndDecrement() + def pythonPort = startingExpansionPortNumber.getAndDecrement() + def expansionJar = project.project(':sdks:java:testing:expansion-service').buildTestExpansionServiceJar.archivePath + def expansionServiceOpts = [ +"group_id": project.name, +"java_expansion_service_jar": expansionJar, +"java_port": javaPort, +"python_virtualenv_dir": envDir, +"python_expansion_service_module": "apache_beam.runners.portability.expansion_service_test", +"python_port": pythonPort + ] + def serviceArgs = project.project(':sdks:python').mapToArgString(expansionServiceOpts) + def setupTask = project.tasks.create(name: config.name+"Setup", type: Exec) { +dependsOn ':sdks:java:container:docker' +dependsOn ':sdks:python:container:docker' +dependsOn ':sdks:java:testing:expansion-service:buildTestExpansionServiceJar' +dependsOn ":sdks:python:installGcpTest" +// setup test env +executable 'sh' +args '-c', "$pythonDir/scripts/run_expansion_services.sh stop --group_id ${project.name} && $pythonDir/scripts/run_expansion_services.sh start $serviceArgs" + } + + def mainTask = project.tasks.create(name: config.name) { +group = "Verification" +description = "Validates cross-language capability of runner" + } + + def cleanupTask = project.tasks.create(name: config.name+'Cleanup', type: Exec) { +// teardown test env +executable 'sh' +args '-c', "$pythonDir/scripts/run_expansion_services.sh stop --group_id ${project.name}" + } + setupTask.finalizedBy cleanupTask + + // Task for running testcases in Java SDK + def beamJavaTestPipelineOptions = [ + "--runner=org.apache.beam.runners.reference.testing.TestPortableRunner", +"--jobServerDriver=${config.jobServerDriver}", +"--environmentCacheMillis=1" + ] + beamJavaTestPipelineOptions.addAll(config.pipelineOpts) + if (config.jobServerConfig) { + beamJavaTestPipelineOptions.add("--jobServerConfig=${config.jobServerConfig}") + } + ['Java': javaPort, 'Python': pythonPort].each { sdk, port -> +def javaTask = project.tasks.create(name: config.name+"JavaUsing"+sdk, type: Test) { + group = "Verification" + description = "Validates runner for cross-language capability of using ${sdk} transforms from Java SDK" + systemProperty "beamTestPipelineOptions", JsonOutput.toJson(beamJavaTestPipelineOptions) + systemProperty "expansionPort", port + classpath = config.testClasspathConfiguration + testClassesDirs = project.files(project.project(":runners:core-construction-java").sourceSets.test.output.classesDirs) + maxParallelForks config.numParallelTests + useJUnit(config.testCategories) + // increase maxHeapSize as this is directly correlated to direct memory, + // see https://issues.apache.org/jira/browse/BEAM-6698 + maxHeapSize = '4g' + dependsOn setupTask +} +mainTask.dependsOn javaTask +cleanupTask.mustRunAfter javaTask + +// Task for running testcases in Python SDK +def testOpts = [ + "--attr=UsesCrossLanguageTransforms" +] +def
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243577 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 18:57 Start Date: 16/May/19 18:57 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493190945 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243577) Time Spent: 8h 20m (was: 8h 10m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243578 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 18:57 Start Date: 16/May/19 18:57 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493190945 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243578) Time Spent: 8.5h (was: 8h 20m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243576 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 18:57 Start Date: 16/May/19 18:57 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493185048 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243576) Time Spent: 8h 10m (was: 8h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243575 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 16/May/19 18:55 Start Date: 16/May/19 18:55 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596#issuecomment-493190447 +R: @yifanzou This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243575) Time Spent: 20m (was: 10m) > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5510) Records including datetime to be saved as DATETIME or TIMESTAMP in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841636#comment-16841636 ] Ahmet Altay commented on BEAM-5510: --- cc: [~chamikara] [~pabloem] > Records including datetime to be saved as DATETIME or TIMESTAMP in BigQuery > --- > > Key: BEAM-5510 > URL: https://issues.apache.org/jira/browse/BEAM-5510 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.6.0 >Reporter: Pascal Gula >Priority: Major > > When trying to write some row in BigQuery that include a python datetime > object, the marshaling used to save a row in BigQuery is impossible. > {code:java} > File > "/home/pascal/Wks/GitHub/PEAT-AI/Albatros/venv/local/lib/python2.7/site-packages/apache_beam/internal/gcp/json_value.py", > line 124, in to_json_value > raise TypeError('Cannot convert %s to a JSON value.' % repr(obj)) > TypeError: Cannot convert datetime.datetime(2018, 9, 25, 18, 57, 18, 108579) > to a JSON value. [while running 'save/WriteToBigQuery'] > {code} > However, this is something perfectly feasible, as `google-cloud-python` > supports it since this issue has been solved: > [https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2957] > thanks to this pull request: > [https://github.com/GoogleCloudPlatform/google-cloud-python/pull/3426/files] > As similar approach could be taken for the `json_value.py` helper. > Is there any workaround that can be applied to solve this issue? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?focusedWorklogId=243565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243565 ] ASF GitHub Bot logged work on BEAM-7339: Author: ASF GitHub Bot Created on: 16/May/19 18:43 Start Date: 16/May/19 18:43 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #8596: [BEAM-7339] Make input and checksum configurable for Python WordCountIT URL: https://github.com/apache/beam/pull/8596 This is step 1 to support large input for WordCountIT benchmark. Make `input` and `expect_checksum` configurable from command line for WordCountIT. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243560 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 18:40 Start Date: 16/May/19 18:40 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-493185048 run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243560) Time Spent: 8h (was: 7h 50m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7339) Enable 1Gb input for Python wordcount benchmark
[ https://issues.apache.org/jira/browse/BEAM-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841629#comment-16841629 ] Ahmet Altay commented on BEAM-7339: --- If this is for a benchmark, should we target larger input (> 100 GB). Is there a reason like a limitation for us to use 1 GB value? > Enable 1Gb input for Python wordcount benchmark > --- > > Key: BEAM-7339 > URL: https://issues.apache.org/jira/browse/BEAM-7339 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > > Requirement: > - Use input from: gs://apache-beam-samples/input_small_files/* > - Use TestDataflowRunner > - Limit worker number > - Disable autoscaling > - Enable both py2 and py3 benchmarks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=243556=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-243556 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 16/May/19 18:36 Start Date: 16/May/19 18:36 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#discussion_r284839573 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py ## @@ -136,6 +136,7 @@ def main(unused_argv): service_descriptor = endpoints_pb2.ApiServiceDescriptor() text_format.Merge(os.environ['CONTROL_API_SERVICE_DESCRIPTOR'], service_descriptor) +_load_avro_coder(sdk_pipeline_options) Review comment: Can you explain why we need to import the AvroCoder here (but not the other coders). Can we load coders in a uniform way ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 243556) Time Spent: 7h 40m (was: 7.5h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)