[jira] [Commented] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075858#comment-17075858 ] Chad Dombrova commented on BEAM-8979: - I thought this issue was resolved. Can you provide the error you are getting and how to reproduce it? What version of beam are you using? > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Fix For: Not applicable > > Time Spent: 12h 10m > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce this error locally. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9626) pymongo should be an optional requirement
Chad Dombrova created BEAM-9626: --- Summary: pymongo should be an optional requirement Key: BEAM-9626 URL: https://issues.apache.org/jira/browse/BEAM-9626 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova The pymongo driver is installed by default, but as the number of IO connectors in the python sdk grows, I don't think this is the precedent we want to set. We already have "extra" packages for gcp, aws, and interactive, we should also add one for mongo. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9479) Provide option to run pylint in local git pre-commit
Chad Dombrova created BEAM-9479: --- Summary: Provide option to run pylint in local git pre-commit Key: BEAM-9479 URL: https://issues.apache.org/jira/browse/BEAM-9479 Project: Beam Issue Type: Improvement Components: testing Reporter: Chad Dombrova Assignee: Chad Dombrova Now that we have support for running yapf in pre-commit, it would be nice to add pylint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9427) Python precommits flaky with ImportError: cannot import name 'ContextManager'
[ https://issues.apache.org/jira/browse/BEAM-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049827#comment-17049827 ] Chad Dombrova commented on BEAM-9427: - hmmm it would be nice to prevent these transitive deps from floating between versions. Tools like poetry and pipenv provide a lockfile mechanism to help with that. But we could run {{pip freeze}} and use that as a constraints file when installing. > Python precommits flaky with ImportError: cannot import name 'ContextManager' > - > > Key: BEAM-9427 > URL: https://issues.apache.org/jira/browse/BEAM-9427 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: Not applicable > > > https://builds.apache.org/job/beam_PreCommit_Python_Commit/11469/ > {code} > 11:51:34 > Task :sdks:python:test-suites:tox:py35:testPy35Gcp FAILED > 11:51:34 py35-gcp-pytest create: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-gcp-pytest/py35-gcp-pytest > 11:51:34 ERROR: invocation failed (exit code 1), logfile: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-gcp-pytest/py35-gcp-pytest/log/py35-gcp-pytest-0.log > 11:51:34 == log start > === > 11:51:34 ImportError: cannot import name 'ContextManager' > 11:51:34 > 11:51:34 === log end > > 11:51:34 ERROR: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-gcp-pytest (exited with code 1) > 11:51:34 ___ summary > > 11:51:34 ERROR: py35-gcp-pytest: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-gcp-pytest (exited with code 1) > {code} > {code} > 11:51:35 > Task :sdks:python:test-suites:tox:py35:testPy35Cython FAILED > 11:51:35 py35-cython-pytest create: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython-pytest/py35-cython-pytest > 11:51:35 ERROR: invocation failed (exit code 1), logfile: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython-pytest/py35-cython-pytest/log/py35-cython-pytest-0.log > 11:51:35 == log start > === > 11:51:35 ImportError: cannot import name 'ContextManager' > 11:51:35 > 11:51:35 === log end > > 11:51:35 ERROR: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-cython-pytest (exited with code 1) > 11:51:35 ___ summary > > 11:51:35 ERROR: py35-cython-pytest: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-cython-pytest (exited with code 1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9427) Python precommits flaky with ImportError: cannot import name 'ContextManager'
[ https://issues.apache.org/jira/browse/BEAM-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049714#comment-17049714 ] Chad Dombrova commented on BEAM-9427: - Is there a traceback anywhere that can tell us what's importing {{ContextManager}}? Relates to: - testing on specific minor/micro releases: BEAM-8152 - dropping support for 3.5.2: BEAM-9372 - stop installing typing for python versions that include it: https://github.com/apache/beam/pull/10821 {quote}Looks like typing.ContextManager is new in 3.5.4{quote} Could it be new in 3.5.3? I don't see a tag for 3.5.4 in the typing releases (though I'm not 100% certain how strongly the typing releases relate to python releases) https://github.com/python/typing/releases > Python precommits flaky with ImportError: cannot import name 'ContextManager' > - > > Key: BEAM-9427 > URL: https://issues.apache.org/jira/browse/BEAM-9427 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Udi Meiri >Priority: Major > > https://builds.apache.org/job/beam_PreCommit_Python_Commit/11469/ > {code} > 11:51:34 > Task :sdks:python:test-suites:tox:py35:testPy35Gcp FAILED > 11:51:34 py35-gcp-pytest create: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-gcp-pytest/py35-gcp-pytest > 11:51:34 ERROR: invocation failed (exit code 1), logfile: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-gcp-pytest/py35-gcp-pytest/log/py35-gcp-pytest-0.log > 11:51:34 == log start > === > 11:51:34 ImportError: cannot import name 'ContextManager' > 11:51:34 > 11:51:34 === log end > > 11:51:34 ERROR: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-gcp-pytest (exited with code 1) > 11:51:34 ___ summary > > 11:51:34 ERROR: py35-gcp-pytest: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-gcp-pytest (exited with code 1) > {code} > {code} > 11:51:35 > Task :sdks:python:test-suites:tox:py35:testPy35Cython FAILED > 11:51:35 py35-cython-pytest create: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython-pytest/py35-cython-pytest > 11:51:35 ERROR: invocation failed (exit code 1), logfile: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython-pytest/py35-cython-pytest/log/py35-cython-pytest-0.log > 11:51:35 == log start > === > 11:51:35 ImportError: cannot import name 'ContextManager' > 11:51:35 > 11:51:35 === log end > > 11:51:35 ERROR: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-cython-pytest (exited with code 1) > 11:51:35 ___ summary > > 11:51:35 ERROR: py35-cython-pytest: InvocationError for command > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > -m virtualenv --no-download --python > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-1227304285/bin/python3.5 > py35-cython-pytest (exited with code 1) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9058) Line-too-long Python lint checks are no longer working.
[ https://issues.apache.org/jira/browse/BEAM-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048437#comment-17048437 ] Chad Dombrova commented on BEAM-9058: - This is resolved. > Line-too-long Python lint checks are no longer working. > --- > > Key: BEAM-9058 > URL: https://issues.apache.org/jira/browse/BEAM-9058 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Without getting into a question whether 80 is a reasonable limit, what are > the reasons to treat type annotations differently than other parts of Python > code (including comments), which have a 80 character limit? > > Places that need to be fixed (PR coming): > ``` > {noformat} > > Task :sdks:python:test-suites:tox:py37:lintPy37 > > * Module apache_beam.testing.test_stream_it_test > apache_beam/testing/test_stream_it_test.py:50:0: C0301: Line too long > (106/80) (line-too-long) > * Module apache_beam.io.filesystems > apache_beam/io/filesystems.py:99:0: C0301: Line too long (101/80) > (line-too-long) > apache_beam/io/filesystems.py:100:0: C0301: Line too long (111/80) > (line-too-long) > * Module apache_beam.runners.portability.fn_api_runner > apache_beam/runners/portability/fn_api_runner.py:115:0: C0301: Line too long > (114/80) (line-too-long) > * Module apache_beam.transforms.core > apache_beam/transforms/core.py:1307:0: C0301: Line too long (116/80) > (line-too-long) > apache_beam/transforms/core.py:2271:0: C0301: Line too long (95/80) > (line-too-long) > apache_beam/transforms/core.py:2272:0: C0301: Line too long (90/80) > (line-too-long) > * Module setup > setup.py:141:0: C0301: Line too long (81/80) (line-too-long) > > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048436#comment-17048436 ] Chad Dombrova commented on BEAM-8979: - This is resolved. > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce this error locally. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9414) [beam_PostCommit_Python2] build failed
[ https://issues.apache.org/jira/browse/BEAM-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048423#comment-17048423 ] Chad Dombrova commented on BEAM-9414: - This should have been fixed in BEAM-9405, in this PR: https://github.com/apache/beam/pull/11001 [~chamikara] the link you posted is the _fix_ to the problem, rather than the cause. Well, that's the intent anyway. {{PortableRunner.create_job_service}} was renamed, which caused the break in that test, and in the PR linked above it was renamed back to {{PortableRunner.create_job_service}}, so everything should be good again. Let me know if it is not. > [beam_PostCommit_Python2] build failed > -- > > Key: BEAM-9414 > URL: https://issues.apache.org/jira/browse/BEAM-9414 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Yueyang Qiu >Assignee: Chad Dombrova >Priority: Major > Labels: currently-failing > > See [https://builds.apache.org/job/beam_PostCommit_Python2/1844/] > > Error: > > *16:07:07* FAILURE: Build failed with an exception.*16:07:07* *16:07:07* * > Where:*16:07:07* Build file > '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2/src/sdks/python/test-suites/portable/py2/build.gradle' > line: 143*16:07:07* *16:07:07* * What went wrong:*16:07:07* Execution failed > for task > ':sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount'.*16:07:07* > > Process 'command 'sh'' finished with non-zero exit value 1 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9405) Python PostCommit is flaky: 'PortableRunner' object has no attribute 'create_job_service'
[ https://issues.apache.org/jira/browse/BEAM-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047888#comment-17047888 ] Chad Dombrova commented on BEAM-9405: - PR is up: [https://github.com/apache/beam/pull/11001] > Python PostCommit is flaky: 'PortableRunner' object has no attribute > 'create_job_service' > - > > Key: BEAM-9405 > URL: https://issues.apache.org/jira/browse/BEAM-9405 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > See: [https://builds.apache.org/job/beam_PostCommit_Python2/] > It seems that it is caused by [this > |https://github.com/apache/beam/commit/1856d8533c879ab236d0593be1f9c7fff41edd7f]commit. > An example log: > {code:java} > :sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount FAILED > DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. > Please upgrade your Python as Python 2.7 is no longer maintained. A future > version of pip will drop support for Python 2.7. More details about Python 2 > support in pip, can be found at > https://pip.pypa.io/en/latest/development/release-process/#python-2-support > apache_beam/__init__.py:82: UserWarning: You are using Apache Beam with > Python 2. New releases of Apache Beam will soon support Python 3 only. > 'You are using Apache Beam with Python 2. ' > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > ERROR StatusLogger Log4j2 could not find a logging implementation. Please add > log4j-core to the classpath. Using SimpleLogger to log to the console... > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 137, in > main() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 128, in main > p.runner.create_job_service(pipeline_options) > AttributeError: 'PortableRunner' object has no attribute 'create_job_service' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9405) Python PostCommit is flaky: 'PortableRunner' object has no attribute 'create_job_service'
[ https://issues.apache.org/jira/browse/BEAM-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047731#comment-17047731 ] Chad Dombrova commented on BEAM-9405: - I will have a look at this first thing in the morning. On Fri, Feb 28, 2020 at 6:13 AM Kamil Wasilewski (Jira) > Python PostCommit is flaky: 'PortableRunner' object has no attribute > 'create_job_service' > - > > Key: BEAM-9405 > URL: https://issues.apache.org/jira/browse/BEAM-9405 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > > See: [https://builds.apache.org/job/beam_PostCommit_Python2/] > It seems that it is caused by [this > |https://github.com/apache/beam/commit/1856d8533c879ab236d0593be1f9c7fff41edd7f]commit. > An example log: > {code:java} > :sdks:python:test-suites:portable:py2:crossLanguagePortableWordCount FAILED > DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. > Please upgrade your Python as Python 2.7 is no longer maintained. A future > version of pip will drop support for Python 2.7. More details about Python 2 > support in pip, can be found at > https://pip.pypa.io/en/latest/development/release-process/#python-2-support > apache_beam/__init__.py:82: UserWarning: You are using Apache Beam with > Python 2. New releases of Apache Beam will soon support Python 3 only. > 'You are using Apache Beam with Python 2. ' > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > ERROR StatusLogger Log4j2 could not find a logging implementation. Please add > log4j-core to the classpath. Using SimpleLogger to log to the console... > Traceback (most recent call last): > File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 137, in > main() > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/sdks/python/apache_beam/examples/wordcount_xlang.py", > line 128, in main > p.runner.create_job_service(pipeline_options) > AttributeError: 'PortableRunner' object has no attribute 'create_job_service' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040558#comment-17040558 ] Chad Dombrova commented on BEAM-3788: - Unless something has changed recently, https://issues.apache.org/jira/browse/BEAM-7870 is still a blocker for using KafkaIO in python out of the box. As the title suggests, it's also blocking PubSubIO in python and conceptually any external transform with a non-trivial coder. [~mxm], [~bhulette] has anything changed on that issue lately? > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8402) Create a class hierarchy to represent environments
[ https://issues.apache.org/jira/browse/BEAM-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037961#comment-17037961 ] Chad Dombrova commented on BEAM-8402: - This is resolved in 2.18 > Create a class hierarchy to represent environments > -- > > Key: BEAM-8402 > URL: https://issues.apache.org/jira/browse/BEAM-8402 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > As a first step towards making it possible to assign different environments > to sections of a pipeline, we first need to expose environment classes to the > pipeline API. Unlike PTransforms, PCollections, Coders, and Windowings, > environments exists solely in the portability framework as protobuf objects. > By creating a hierarchy of "native" classes that represent the various > environment types -- external, docker, process, etc -- users will be able to > instantiate these and assign them to parts of the pipeline. The assignment > portion will be covered in a follow-up issue/PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9276) python: create a class to encapsulate the work required to submit a pipeline to a job service
Chad Dombrova created BEAM-9276: --- Summary: python: create a class to encapsulate the work required to submit a pipeline to a job service Key: BEAM-9276 URL: https://issues.apache.org/jira/browse/BEAM-9276 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova {{PortableRunner.run_pipeline}} is somewhat of a monolithic method for submitting a pipeline. It would be useful to factor out the code responsible for interacting with the job and artifact services (prepare, stage, run) to make this easier to modify this behavior in portable runner subclasses, as well as in tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9274) Support running yapf in a git pre-commit hook
Chad Dombrova created BEAM-9274: --- Summary: Support running yapf in a git pre-commit hook Key: BEAM-9274 URL: https://issues.apache.org/jira/browse/BEAM-9274 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova As a developer I want to be able to automatically run yapf before I make a commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9202) lintPy37 precommit broken
[ https://issues.apache.org/jira/browse/BEAM-9202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024800#comment-17024800 ] Chad Dombrova commented on BEAM-9202: - Looks like this issue was already fixed by Ankur > lintPy37 precommit broken > - > > Key: BEAM-9202 > URL: https://issues.apache.org/jira/browse/BEAM-9202 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Chad Dombrova >Priority: Major > > Culprit: https://github.com/apache/beam/pull/10683 > Jenkins tests are not started automatically. > {code} > 09:47:37 > Task :sdks:python:test-suites:tox:py37:lintPy37 > 09:47:37 * Module apache_beam.io.gcp.datastore.v1new.types > 09:47:37 apache_beam/io/gcp/datastore/v1new/types.py:47:0: C0301: Line too > long (87/80) (line-too-long) > {code} > https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2033/console -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018227#comment-17018227 ] Chad Dombrova commented on BEAM-9130: - {quote} My solution doesn't require the workaround in setup.py, which TBH I still don't fully follow. {quote} It's some seriously esoteric python stuff, but here's a terse summary. The protobuf package in python2 is installed as "protobuf", but it uses a protobuf-3.11.0-py3.7-nspkg.pth file (living next to the installed package) to mount the package under the "google" namespace package. This was an effort by google to allow all of their various python libs to install into one parent "google" namespace. All well and good except that .pth files don't work unless they are in one of the special directories returned by {{site.getsitepackages()}}, which is by default just the python interpreter's native lib and site-packges directories. So if you ever do a non-standard install of protobuf (i.e. using {{pip install --target}}) then the .pth file is not read and executed by the python interpreter and the package cannot be imported as {{google.protobuf}}. Got all that? Well you can forget it all now, because they added official support for namespace packages in python3 that don't require this .pth hack :) {quote} WDYT about my fix, which essentially matches what setupVirtualenv does? {quote} Making this match setupVirtualenv is definitely the right fix. Obvious in retrospect! This has gotten me motivated to revisit my pep517 changes again. I would love some help on it if you have time. > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > Time Spent: 50m > Remaining Estimate: 0h > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as message > ImportError: No module named google.protobuf.message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017732#comment-17017732 ] Chad Dombrova commented on BEAM-9130: - I updated my PR. There are still other problems with it, but I solved the google.protobuf import problem. There are a few pieces to the solution: * use a pyproject.toml file to specify build requirements: [https://github.com/apache/beam/pull/10038/files#diff-71ede7e13a310beaedc4466b48096e26R18] * use pep517 module to build source: [https://github.com/apache/beam/pull/10038/files#diff-d3cc66016c37760e131f228abbee6667R34] * add pep517's temp build directory to the site dir to allow the google package's .pth file to work: [https://github.com/apache/beam/pull/10038/commits/ed06bf1b17e5f60d7f6bcf7766b93e4eaf4b86ff] I'm sure there are easier ways to hack this to work without such dramatic changes, but this approach sets us on a course to untangle the mess that we have now. > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as message > ImportError: No module named google.protobuf.message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017699#comment-17017699 ] Chad Dombrova commented on BEAM-9130: - Just in case it's not clear, I think you'd want to edit the Dockerfile to use tox, here: {code:java} CMD holdup -t 45 http://namenode:50070 http://datanode:50075 && \ echo "Waiting for safe mode to end." && \ sleep 45 && \ hdfscli -v -v -v upload -f kinglear.txt / && \ python -m apache_beam.examples.wordcount \ --input hdfs://kinglear* \ --output hdfs://py-wordcount-integration \ --hdfs_host namenode --hdfs_port 50070 --hdfs_user root {code} > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as message > ImportError: No module named google.protobuf.message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017691#comment-17017691 ] Chad Dombrova commented on BEAM-9130: - the protobuf package is a dependency of grpcio-tools, which is part of build_requirements.txt. Did you include build_requirements.txt to the deps for your tox env? that could help, but it also might not: I can't remember if the deps are installed in time to affect setup.py (which calls gen_protos). I think it may not be, which is where pyproject.toml and pep517 comes into play: they specify the deps that need to be installed before setup.py runs. I'm currently rebasing my old testing refactor branch to see if it helps. can you push your current WIP solution for this so that I can try to help out. I'm on vacation tomorrow and monday, but I may be able to chip away at some things on the plane. > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as message > ImportError: No module named google.protobuf.message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017633#comment-17017633 ] Chad Dombrova edited comment on BEAM-9130 at 1/17/20 1:37 AM: -- IIRC, the reason it only happens for python2 is that the `google` package in python2 uses .pth files to find sub-packages, and .pth files must be in a "site" directory, so if it's installed in a non-standard way (into a directory that isn't one of the python interpreter's official site-packages directories), it won't be able to find the sub-packages. It is possible to mark a directory as a "site" directory using site.addsitedir(). For more info see [https://docs.python.org/2/library/site.html] In python3 namespace packages were redesigned in order to avoid this brittle workflow: [https://www.python.org/dev/peps/pep-0420/] If this test went through tox, I'm pretty sure this error would not occur. What are the chances that we can work on getting all tests running through tox in the near term? was (Author: chadrik): IIRC, the reason it only happens for python2 is that the `google` package in python2 uses .pth files to find sub-packages, and .pth files must be in a "site" directory, so if it's installed in a non-standard way (one that doesn't end up in the python interpreter's official site-packages directory), it won't be able to find the sub-packages. It is possible to mark a directory as a "site" directory using site.addsitedir(). For more info see [https://docs.python.org/2/library/site.html] In python3 namespace packages were redesigned in order to avoid this brittle workflow: [https://www.python.org/dev/peps/pep-0420/] If this test went through tox, I'm pretty sure this error would not occur. What are the chances that we can work on getting all tests running through tox in the near term? > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as
[jira] [Commented] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017633#comment-17017633 ] Chad Dombrova commented on BEAM-9130: - IIRC, the reason it only happens for python2 is that the `google` package in python2 uses .pth files to find sub-packages, and .pth files must be in a "site" directory, so if it's installed in a non-standard way (one that doesn't end up in the python interpreter's official site-packages directory), it won't be able to find the sub-packages. It is possible to mark a directory as a "site" directory using site.addsitedir(). For more info see [https://docs.python.org/2/library/site.html] In python3 namespace packages were redesigned in order to avoid this brittle workflow: [https://www.python.org/dev/peps/pep-0420/] If this test went through tox, I'm pretty sure this error would not occur. What are the chances that we can work on getting all tests running through tox in the near term? > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as message > ImportError: No module named google.protobuf.message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9130) sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with ImportError: No module named google.protobuf.message
[ https://issues.apache.org/jira/browse/BEAM-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017600#comment-17017600 ] Chad Dombrova commented on BEAM-9130: - It works for pre-commit, so I'm assuming that this goes back, once again, to the inconsistent way that the beam python sdk is built. Here's my old PR attempting to get everything using tox and pep517: [https://github.com/apache/beam/pull/10038] > sdks:python:test-suites:direct:py2:hdfsIntegrationTest is failing with > ImportError: No module named google.protobuf.message > --- > > Key: BEAM-9130 > URL: https://issues.apache.org/jira/browse/BEAM-9130 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Labels: currently-failing > > From logs: > {noformat} > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 267, in _bootstrap > 16:33:50 [0m[91mself.run() > 16:33:50File "/usr/local/lib/python2.7/multiprocessing/process.py", line > 114, in run > 16:33:50 self._target(*self._args, **self._kwargs) > 16:33:50File "/app/sdks/python/gen_protos.py", line 357, in > _install_grpcio_tools_and_generate_proto_files > 16:33:50 generate_proto_files(force=force) > 16:33:50File "/app/sdks/python/gen_protos.py", line 324, in > generate_proto_files > 16:33:50 generate_urn_files(log, out_dir) > 16:33:50File "/app/sdks/python/gen_protos.py", line 65, in > generate_urn_files > 16:33:50 import google.protobuf.message as message > 16:33:50 [0m[91mImportError: No module named google.protobuf.message > 16:33:50 [0m[91mTraceback (most recent call last): > 16:33:50File "setup.py", line 305, in > 16:33:50 'mypy': generate_protos_first(mypy), > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/__init__.py", line 145, in > setup > 16:33:50 [0m[91mreturn distutils.core.setup(**attrs) > 16:33:50File "/usr/local/lib/python2.7/distutils/core.py", line 151, in > setup > 16:33:50 [0m[91mdist.run_commands() > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 953, in > run_commands > 16:33:50 [0m[91mself.run_command(cmd) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 cmd_obj.run() > 16:33:50File > "/usr/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line > 44, in run > 16:33:50 [0m[91mself.run_command('egg_info') > 16:33:50File "/usr/local/lib/python2.7/distutils/cmd.py", line 326, in > run_command > 16:33:50 [0m[91mself.distribution.run_command(command) > 16:33:50File "/usr/local/lib/python2.7/distutils/dist.py", line 972, in > run_command > 16:33:50 [0m[91mcmd_obj.run() > 16:33:50File "setup.py", line 229, in run > 16:33:50 [0m[91mgen_protos.generate_proto_files(log=log) > 16:33:50File "/app/sdks/python/gen_protos.py", line 291, in > generate_proto_files > 16:33:50 raise ValueError("Proto generation failed (see log for > details).") > 16:33:50 [0m[91mValueError: [0m[91mProto generation failed (see log for > details > {noformat} > {noformat} > import google.protobuf.message as message > ImportError: No module named google.protobuf.message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9064) Add pytype to lint checks
[ https://issues.apache.org/jira/browse/BEAM-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010307#comment-17010307 ] Chad Dombrova commented on BEAM-9064: - Thoughts (which I will repeat in the mailing list discussion): I've gotten mypy completely passing on my own branch. It was a lot of work and the longer it lingers the more work it becomes. So let's focus our efforts on getting that merged and integrated into the lint jobs _first._ From there we will learn a few things: 1) How difficult it is for end users to keep the tests passing (i.e. the time and frustration levels with mypy alone), and 2) how much more work it will be to also get pytype passing (i.e. how divergent the two tools are). Based on that feedback we can decide whether we want a pytype failure to _prevent_ a PR from merging, or merely be informative (failures could be something that the google team tracks on the side through metrics/analytics). tl;dr If both tools are actually quite similar, then it shouldn't be much more of a burden for users to maintain both. But if they're not, or users are already struggling with mypy on its own, then we should hold off introducing pytype as a requirement. > Add pytype to lint checks > - > > Key: BEAM-9064 > URL: https://issues.apache.org/jira/browse/BEAM-9064 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9064) Add pytype to lint checks
[ https://issues.apache.org/jira/browse/BEAM-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010258#comment-17010258 ] Chad Dombrova commented on BEAM-9064: - I am _very_ reticent to implement this. I haven't used pytype, but my experience working with mypy over the past few years, and following various issues and peps related to it and typing in general, has taught me there's still a lot of room for interpretation and thus variation between type checkers. As a user, it can be quite challenging to solve certain typing issues, and I would not be the least bit surprised to find that certain problems cannot be solved in a way that satisfies both linters, at least not without some seriously ugly workarounds. We're already asking for our contributors to gain a whole new area of expertise in order to get their PRs merged – one with a fairly steep learning curve – I wouldn't want to burden them with an additional linter with its own idiosyncrasies. > Add pytype to lint checks > - > > Key: BEAM-9064 > URL: https://issues.apache.org/jira/browse/BEAM-9064 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, testing >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001295#comment-17001295 ] Chad Dombrova commented on BEAM-9012: - I imagine there are going to be _lots_ of little differences between mypy and pytype. I'm curious your motivation for using pytype. Do you think we should aim to support both? I'd be a bit wary of doing so, since getting mypy to pass can be challenging enough on its own. I can imagine scenarios where there is no solution that appeases both mypy and pytype (thinking particularly of overloads, whose semantics seem to vary between tools). > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001279#comment-17001279 ] Chad Dombrova commented on BEAM-9012: - Fine by me. Brian, if you're into the static typing thing, you may want to poke in over at my second PR, which is waiting on some feedback: [https://github.com/apache/beam/pull/10367] There will probably be a third (and hopefully final) PR after that one to get the project to a point where mypy is fully passing. We can take care of this issue in that final PR. > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8613) Add environment variable support to Docker environment
[ https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992035#comment-16992035 ] Chad Dombrova edited comment on BEAM-8613 at 12/17/19 12:34 AM: {quote}What kind of environment variables are you trying to pass here? {quote} We're primarily interested in configuring various libraries and applications used by our UDFs. These each have their own set of environment variables which typically need to be configured before modules are imported. Another use case which we intend to explore soon is passing env vars to control the behavior of pip in {{boot}}. For example, to point it at our internal pypi mirror. Do you think this falls into the category of "building too much into these (unstructured) string fields"? {quote}Is there not another way to pass this data to the operations being performed in this container? {quote} Let's frame this as a user story: "As a developer, I want to set library- and application-specific env variables (usually third-party) in the SDK process before any affected modules are imported, so that I can bind a particular configuration to a job." Let's evaluate a few options: - custom PipelineOptions: this runs too late. by the time we can read the pipeline options, our UDF and its pcollection element types have been unpickled, thereby importing many dependent modules. - custom config file uploaded to artifact service: same problem as above. - custom docker container: this is too slow. we don't want to create a new docker container for every permutation that we might need. we want this to be user controlled at job submission time - custom docker ARGS: this is needlessly complicated. theoretically if we had a custom docker container with a custom entrypoint script and the ability to configure docker args via the DOCKER environment we could get this to work, but that's a lot of work just to set some env vars. besides, we already have the ability to set env vars for PROCESS environment type, so doing the same for DOCKER seems natural. I'm not sure what other good options there are. Environment variables seem like the most direct and generally useful approach. was (Author: chadrik): {quote}What kind of environment variables are you trying to pass here? {quote} We're primarily interested in configuring various libraries and applications used by our UDFs. These each have their own set of environment variables which typically need to be configured before modules are imported. Another use case which we intend to explore soon is passing env vars to control the behavior of pip in {{boot}}. For example, to point it at our internal pypi mirror. Do you think this falls into the category of "building too much into these (unstructured) string fields"? {quote}Is there not another way to pass this data to the operations being performed in this container? {quote} Let's frame this as a user story: "As a developer, I want to set library- and application-specific env variables (usually third-party) in the SDK process before any affected modules are imported, so that I can bind a particular configuration to a job." Let's evaluate a few options: - custom PipelineOptions: by the time we can read the pipeline options, our UDF and its pcollection element types have been unpickled, thereby importing many dependent modules. - custom config file uploaded to artifact service: same problem as above. - custom docker container: we don't want to create a new docker container for every permutation that we might need. we want this to be user controlled at job submission time - custom docker ARGS: theoretically if we had a custom docker container with a custom entrypoint script and the ability to configure docker args via the DOCKER environment we could get this to work. this just seems needlessly complicated. we already have the ability to set env vars for PROCESS environment type, so doing the same for DOCKER seems natural. I'm not sure what other good options there are. Environment variables seem like the most direct and generally useful approach. > Add environment variable support to Docker environment > -- > > Key: BEAM-8613 > URL: https://issues.apache.org/jira/browse/BEAM-8613 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution, runner-core, runner-direct >Reporter: Nathan Rusch >Assignee: Nathan Rusch >Priority: Trivial > Time Spent: 1h > Remaining Estimate: 0h > > The Process environment allows specifying environment variables via a map > field on its payload message. The Docker environment should support this same > pattern, and forward the contents of the map through to the container runtime. -- This message was sent
[jira] [Commented] (BEAM-8613) Add environment variable support to Docker environment
[ https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997743#comment-16997743 ] Chad Dombrova commented on BEAM-8613: - [~robertwb] any additional thoughts on this? > Add environment variable support to Docker environment > -- > > Key: BEAM-8613 > URL: https://issues.apache.org/jira/browse/BEAM-8613 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution, runner-core, runner-direct >Reporter: Nathan Rusch >Assignee: Nathan Rusch >Priority: Trivial > Time Spent: 1h > Remaining Estimate: 0h > > The Process environment allows specifying environment variables via a map > field on its payload message. The Docker environment should support this same > pattern, and forward the contents of the map through to the container runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
[ https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997506#comment-16997506 ] Chad Dombrova commented on BEAM-8966: - Ok, I'm working on trying to reproduce this, but running into errors. This test copies the entirety of the sdks/python directory, including all of my intermediate build, target, and tox env directories, and that's tripping something up. Seems like another example of a highly inefficient approach to creating build isolation. Is there a reason the ITs don't go through tox? > failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest > --- > > Key: BEAM-8966 > URL: https://issues.apache.org/jira/browse/BEAM-8966 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Chad Dombrova >Priority: Major > > I believe this is due to https://github.com/apache/beam/pull/9915 > {code} > [0mCollecting mypy-protobuf==1.12 > Using cached > https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl > Installing collected packages: mypy-protobuf > Successfully installed mypy-protobuf-1.12 > [91mbeam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but > not used. > beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not > used. > [0m[91mTraceback (most recent call last): > File "/usr/local/bin/protoc-gen-mypy", line 13, in > import google.protobuf.descriptor_pb2 as d > ModuleNotFoundError: No module named 'google' > [0m[91m--mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > [0m[91mProcess Process-1: > Traceback (most recent call last): > File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files > from grpc_tools import protoc > ModuleNotFoundError: No module named 'grpc_tools' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in > _bootstrap > self.run() > File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run > self._target(*self._args, **self._kwargs) > File "/app/sdks/python/gen_protos.py", line 189, in > _install_grpcio_tools_and_generate_proto_files > generate_proto_files() > File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files > '%s' % ret_code) > RuntimeError: Protoc returned non-zero status (see logs for details): 1 > [0m[91mTraceback (most recent call last): > File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files > from grpc_tools import protoc > ModuleNotFoundError: No module named 'grpc_tools' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "setup.py", line 295, in > 'mypy': generate_protos_first(mypy), > File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line > 145, in setup > return distutils.core.setup(**attrs) > File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup > dist.run_commands() > File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands > self.run_command(cmd) > File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command > cmd_obj.run() > File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > self.run_command('egg_info') > File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command > self.distribution.run_command(command) > File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command > cmd_obj.run() > File "setup.py", line 220, in run > gen_protos.generate_proto_files(log=log) > File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files > raise ValueError("Proto generation failed (see log for details).") > ValueError: Proto generation failed (see log for details). > [0mService 'test' failed to build: The command '/bin/sh -c cd sdks/python && > python setup.py sdist && pip install --no-cache-dir $(ls > dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1 > {code} > https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
[ https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997438#comment-16997438 ] Chad Dombrova commented on BEAM-8966: - hmmm... the errors show two different problems. In Udi's, mypy-protobuf is newly installed, protoc-gen-mypy is found by protoc, but there's a failure to import google.protobuf. In Kamil's mypy-protobuf is already installed, but protoc-gen-mypy is not found by protoc. Does anyone have a way to reproduce this reliably? What about locally? I have a hunch this is the kind of problem that would be solved by the pep517 efforts I've been working on. In the meantime, the mypy tests are not required – they don't actually pass yet, they're simply there for reference – so we could remove the mypy-protobuf change and disable the tests until after the pep517 changes. > failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest > --- > > Key: BEAM-8966 > URL: https://issues.apache.org/jira/browse/BEAM-8966 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Udi Meiri >Assignee: Chad Dombrova >Priority: Major > > I believe this is due to https://github.com/apache/beam/pull/9915 > {code} > [0mCollecting mypy-protobuf==1.12 > Using cached > https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl > Installing collected packages: mypy-protobuf > Successfully installed mypy-protobuf-1.12 > [91mbeam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but > not used. > beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not > used. > [0m[91mTraceback (most recent call last): > File "/usr/local/bin/protoc-gen-mypy", line 13, in > import google.protobuf.descriptor_pb2 as d > ModuleNotFoundError: No module named 'google' > [0m[91m--mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > [0m[91mProcess Process-1: > Traceback (most recent call last): > File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files > from grpc_tools import protoc > ModuleNotFoundError: No module named 'grpc_tools' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in > _bootstrap > self.run() > File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run > self._target(*self._args, **self._kwargs) > File "/app/sdks/python/gen_protos.py", line 189, in > _install_grpcio_tools_and_generate_proto_files > generate_proto_files() > File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files > '%s' % ret_code) > RuntimeError: Protoc returned non-zero status (see logs for details): 1 > [0m[91mTraceback (most recent call last): > File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files > from grpc_tools import protoc > ModuleNotFoundError: No module named 'grpc_tools' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "setup.py", line 295, in > 'mypy': generate_protos_first(mypy), > File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line > 145, in setup > return distutils.core.setup(**attrs) > File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup > dist.run_commands() > File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands > self.run_command(cmd) > File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command > cmd_obj.run() > File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > self.run_command('egg_info') > File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command > self.distribution.run_command(command) > File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command > cmd_obj.run() > File "setup.py", line 220, in run > gen_protos.generate_proto_files(log=log) > File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files > raise ValueError("Proto generation failed (see log for details).") > ValueError: Proto generation failed (see log for details). > [0mService 'test' failed to build: The command '/bin/sh -c cd sdks/python && > python setup.py sdist && pip install --no-cache-dir $(ls > dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1 > {code} > https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8613) Add environment variable support to Docker environment
[ https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992035#comment-16992035 ] Chad Dombrova commented on BEAM-8613: - {quote}What kind of environment variables are you trying to pass here? {quote} We're primarily interested in configuring various libraries and applications used by our UDFs. These each have their own set of environment variables which typically need to be configured before modules are imported. Another use case which we intend to explore soon is passing env vars to control the behavior of pip in {{boot}}. For example, to point it at our internal pypi mirror. Do you think this falls into the category of "building too much into these (unstructured) string fields"? {quote}Is there not another way to pass this data to the operations being performed in this container? {quote} Let's frame this as a user story: "As a developer, I want to set library- and application-specific env variables (usually third-party) in the SDK process before any affected modules are imported, so that I can bind a particular configuration to a job." Let's evaluate a few options: - custom PipelineOptions: by the time we can read the pipeline options, our UDF and its pcollection element types have been unpickled, thereby importing many dependent modules. - custom config file uploaded to artifact service: same problem as above. - custom docker container: we don't want to create a new docker container for every permutation that we might need. we want this to be user controlled at job submission time - custom docker ARGS: theoretically if we had a custom docker container with a custom entrypoint script and the ability to configure docker args via the DOCKER environment we could get this to work. this just seems needlessly complicated. we already have the ability to set env vars for PROCESS environment type, so doing the same for DOCKER seems natural. I'm not sure what other good options there are. Environment variables seem like the most direct and generally useful approach. > Add environment variable support to Docker environment > -- > > Key: BEAM-8613 > URL: https://issues.apache.org/jira/browse/BEAM-8613 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution, runner-core, runner-direct >Reporter: Nathan Rusch >Assignee: Nathan Rusch >Priority: Trivial > Time Spent: 1h > Remaining Estimate: 0h > > The Process environment allows specifying environment variables via a map > field on its payload message. The Docker environment should support this same > pattern, and forward the contents of the map through to the container runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8930) External workers should receive artifact endpoint when started from python
Chad Dombrova created BEAM-8930: --- Summary: External workers should receive artifact endpoint when started from python Key: BEAM-8930 URL: https://issues.apache.org/jira/browse/BEAM-8930 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova {{ExternalWorkerHandler}} does not pass the artifact and provision endpoints, making it impossible to provision artifacts when the external worker is started from python. The Java code is properly sending this information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8746) Allow the local job service to work from inside docker
Chad Dombrova created BEAM-8746: --- Summary: Allow the local job service to work from inside docker Key: BEAM-8746 URL: https://issues.apache.org/jira/browse/BEAM-8746 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova Currently the connection is refused. It's a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976165#comment-16976165 ] Chad Dombrova commented on BEAM-7886: - Is there a Jira for adding support for logical types? Is the idea with logical types that we would defer to the stock set of python coders for types outside of the native atomic types? e.g. timestamp, pickle, etc. > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8732) Add support for additional structured types to Schemas/RowCoders
Chad Dombrova created BEAM-8732: --- Summary: Add support for additional structured types to Schemas/RowCoders Key: BEAM-8732 URL: https://issues.apache.org/jira/browse/BEAM-8732 Project: Beam Issue Type: New Feature Components: sdk-py-core Reporter: Chad Dombrova Currently we can convert between a {{NamedTuple}} type and its {{Schema}} protos using {{named_tuple_from_schema}} and {{named_tuple_to_schema}}. I'd like to introduce a system to support additional types, starting with structured types like {{attrs}}, {{dataclasses}}, and {{TypedDict}}. I've only just started digesting the code, but this task seems pretty straightforward. For example, I think the type-to-schema code would look roughly like this: {code:python} def typing_to_runner_api(type_): # type: (Type) -> schema_pb2.FieldType structured_handler = _get_structured_handler(type_) if structured_handler: schema = None if hasattr(type_, 'id'): schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id) if schema is None: fields = structured_handler.get_fields() type_id = str(uuid4()) schema = schema_pb2.Schema(fields=fields, id=type_id) SCHEMA_REGISTRY.add(type_, schema) return schema_pb2.FieldType( row_type=schema_pb2.RowType( schema=schema)) {code} The rest of the work would be in implementing a class hierarchy for working with structured types, such as getting a list of fields from an instance, and instantiation from a list of fields. Eventually we can extend this behavior to arbitrary, unstructured types. Going in the schema-to-type direction, we have the problem of choosing which type to use for a given schema. I believe that as long as {{typing_to_runner_api()}} has been called on our structured type in the current python session, it should be added to the registry and thus round trip ok, so I think we just need a public function for registering schemas for structured types. [~bhulette] Did you want to tackle this or are you ok with me going after it? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8646) PR #9814 appears to cause failures in fnapi_runner tests on Windows
[ https://issues.apache.org/jira/browse/BEAM-8646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973881#comment-16973881 ] Chad Dombrova commented on BEAM-8646: - Hi all. The simplest solution would be to make this new behavior optional (via a pipeline option?), though it's a bit annoying to do so without a better understanding of why this is happening. Unfortunately, we don't have a windows machine to test on. Open to suggestions. > PR #9814 appears to cause failures in fnapi_runner tests on Windows > --- > > Key: BEAM-8646 > URL: https://issues.apache.org/jira/browse/BEAM-8646 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Wanqi Lyu >Priority: Major > > It appears that changes in > > https://github.com/apache/beam/commit/d6bcb03f586b5430c30f6ca4a1af9e42711e529c > cause test failures in Beam test suite on Windows, for example: > python setup.py nosetests --tests > apache_beam/runners/portability/portable_runner_test.py:PortableRunnerTestWithExternalEnv.test_callbacks_with_exception > > does not finish on a Windows VM machine within at least 60 seconds but passes > within a second if we change host_from_worker to return 'localhost' in [1]. > [~violalyu] , do you think you could take a look? Thanks! > cc: [~chadrik] [~thw] > [1] > https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1377. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4441) Incorrect coder inference for List and Tuple typehints.
[ https://issues.apache.org/jira/browse/BEAM-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969430#comment-16969430 ] Chad Dombrova commented on BEAM-4441: - Quick thought on this: the {{Tuple[int, ...]}} case will be harder to solve than {{List[int]}} because if we use the standard {{IterableCoder}} (as we did for list) the value will be round-tripped as a list. So we either accept that limitation or we need to use non-standard coders. I think it would also be interesting to support set and frozenset. > Incorrect coder inference for List and Tuple typehints. > --- > > Key: BEAM-4441 > URL: https://issues.apache.org/jira/browse/BEAM-4441 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > > We seem to use a FastPrimitivesCoder for List and Tuple typehints with > homogenous element types, and fail to do the type checking: > {code:python} > inputs = (1, "a string") > coder = typecoders.registry.get_coder(typehints.Tuple[int, str]) > print(type(coder)) # > encoded = coder.encode(inputs) > # Fails: TypeError: an integer is required - correct behaviour > coder = typecoders.registry.get_coder(typehints.Tuple[int, ...]) # A tuple of > integers. > print(type(coder)) # > - wrong coder? > encoded = coder.encode(inputs) > # No errors - incorrect behavior. > coder = typecoders.registry.get_coder(typehints.List[int]) # A list of > integers. > print(type(coder)) # > - wrong coder? > encoded = coder.encode(inputs) > # No errors - incorrect behavior.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4441) Incorrect coder inference for List and Tuple typehints.
[ https://issues.apache.org/jira/browse/BEAM-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969422#comment-16969422 ] Chad Dombrova commented on BEAM-4441: - I fixed the list case, but not the homogenous tuple case. > Incorrect coder inference for List and Tuple typehints. > --- > > Key: BEAM-4441 > URL: https://issues.apache.org/jira/browse/BEAM-4441 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Minor > > We seem to use a FastPrimitivesCoder for List and Tuple typehints with > homogenous element types, and fail to do the type checking: > {code:python} > inputs = (1, "a string") > coder = typecoders.registry.get_coder(typehints.Tuple[int, str]) > print(type(coder)) # > encoded = coder.encode(inputs) > # Fails: TypeError: an integer is required - correct behaviour > coder = typecoders.registry.get_coder(typehints.Tuple[int, ...]) # A tuple of > integers. > print(type(coder)) # > - wrong coder? > encoded = coder.encode(inputs) > # No errors - incorrect behavior. > coder = typecoders.registry.get_coder(typehints.List[int]) # A list of > integers. > print(type(coder)) # > - wrong coder? > encoded = coder.encode(inputs) > # No errors - incorrect behavior.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8552) Spark runner should not used STOPPED state as a terminal state
Chad Dombrova created BEAM-8552: --- Summary: Spark runner should not used STOPPED state as a terminal state Key: BEAM-8552 URL: https://issues.apache.org/jira/browse/BEAM-8552 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Chad Dombrova Over at BEAM-8539 we're trying to iron out the definitions of the job state enums. One thing that is clear is that STOPPED is not supposed to be a terminal state, though since it was poorly documented it slipped in a few places. There are two places in the {{SparkPipelineResult}} where STOPPED seems to be used as a terminal event. These should either be DONE, FAILED, or CANCELLED. Here's one example: [https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java#L133] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966373#comment-16966373 ] Chad Dombrova commented on BEAM-7870: - Note that there is not yet support in python for registering a native type to replace the row, as there is in Java. In the example above we'd end up with the {{TypedDict}} subclass instead of {{gcp.pubsub.PubsubMessage}}. This could be a follow up task to row coder support. [~bhulette], do we have a Jira for registering native types for row converters yet? > Externally configured KafkaIO / PubsubIO consumer causes coder problems > --- > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6857) Support dynamic timers
[ https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-6857: Description: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: {code:java} DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() { .} @OnTimer("timer2") public void onTimer2() {} } {code} However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; {code:java} DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} } {code} was: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: {code:java} DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() \{ .} @OnTimer("timer2") public void onTimer2() {} } {code} However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; {code:java} DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} } {code} > Support dynamic timers > -- > > Key: BEAM-6857 > URL: https://issues.apache.org/jira/browse/BEAM-6857 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > The Beam timers API currently requires each timer to be statically specified > in the DoFn. The user must provide a separate callback method per timer. For > example: > > {code:java} > DoFn() > { > @TimerId("timer1") > private final TimerSpec timer1 = TimerSpecs.timer(...); > @TimerId("timer2") > private final TimerSpec timer2 = TimerSpecs.timer(...); > .. set timers in processElement > @OnTimer("timer1") > public void onTimer1() { .} > @OnTimer("timer2") > public void onTimer2() {} > } > {code} > > However there are many cases where the user does not know the set of timers > statically when writing their code. This happens when the timer tag should be > based on the data. It also happens when writing a DSL on top of Beam, where > the DSL author has to create DoFns but does not know statically which timers > their users will want to set (e.g. Scio). > > The goal is to support dynamic timers. Something as follows; > > {code:java} > DoFn() > { > @TimerId("timer") > private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); > @ProcessElement process(@TimerId("timer") DynamicTimer timer) > { > timer.set("tag1'", ts); >timer.set("tag2", ts); > } > @OnTimer("timer") > public void onTimer1(@TimerTag String tag) { .} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6857) Support dynamic timers
[ https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-6857: Description: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: {code:java} DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() \{ .} @OnTimer("timer2") public void onTimer2() {} } {code} However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; {code:java} DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} } {code} was: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() \{ .} @OnTimer("timer2") public void onTimer2() \{} } However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .} } > Support dynamic timers > -- > > Key: BEAM-6857 > URL: https://issues.apache.org/jira/browse/BEAM-6857 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > The Beam timers API currently requires each timer to be statically specified > in the DoFn. The user must provide a separate callback method per timer. For > example: > > {code:java} > DoFn() > { @TimerId("timer1") private final TimerSpec timer1 = > TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = > TimerSpecs.timer(...); .. set timers in processElement > @OnTimer("timer1") public void onTimer1() \{ .} > @OnTimer("timer2") public void onTimer2() {} > } > {code} > > However there are many cases where the user does not know the set of timers > statically when writing their code. This happens when the timer tag should be > based on the data. It also happens when writing a DSL on top of Beam, where > the DSL author has to create DoFns but does not know statically which timers > their users will want to set (e.g. Scio). > > The goal is to support dynamic timers. Something as follows; > > {code:java} > DoFn() { > @TimerId("timer") private final TimerSpec timer1 = > TimerSpecs.dynamicTimer(...); > @ProcessElement process(@TimerId("timer") DynamicTimer timer) > { timer.set("tag1'", ts); timer.set("tag2", ts); } > @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964981#comment-16964981 ] Chad Dombrova commented on BEAM-8539: - I searched the java code for STOPPED and it seems to be used correctly everywhere except for possibly one spot in the spark runner: [https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java#L133] It's hard to tell from the [spark docs|https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html] if {{JavaSparkContext.stop()}} is considered terminal (i.e. can it be resumed?). Who would know for sure? {quote}We should add documentation to the JobState enum in the Job API (e.g. state machine diagram). {quote} I'm happy to do this, but I'm not that familiar with the documentation generation, so I have a few questions: * Where should this go? I don't see the generated code for in the python or java docs. Also, since Java and Python use different documentation generators, I'm not sure if the diagram can be universally rendered. If not there, then where? JobInvocation? * Can you give me an example of somewhere else in the code that is currently generating a diagram, so that I can see how it's done? > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8539: Description: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state: *STOPPED* * run: about to submit to executor: STARTING * run: actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, *STOPPED* I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. was: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state (prepared): STOPPED * about to submit to executor: STARTING * actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, STOPPED I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run: about to submit to executor: STARTING > * run: actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8539: Description: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state: *STOPPED* * run - about to submit to executor: STARTING * run - actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, *STOPPED* I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. was: The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state: *STOPPED* * run: about to submit to executor: STARTING * run: actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, *STOPPED* I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has additionally been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964580#comment-16964580 ] Chad Dombrova commented on BEAM-8539: - [~lcwik], [~mxm] you might be interested in this. > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run: about to submit to executor: STARTING > * run: actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8539) Clearly define the valid job state transitions
Chad Dombrova created BEAM-8539: --- Summary: Clearly define the valid job state transitions Key: BEAM-8539 URL: https://issues.apache.org/jira/browse/BEAM-8539 Project: Beam Issue Type: Improvement Components: beam-model, runner-core, sdk-java-core, sdk-py-core Reporter: Chad Dombrova The Beam job state transitions are ill-defined, which is big problem for anything that relies on the values coming from JobAPI.GetStateStream. I was hoping to find something like a state transition diagram in the docs so that I could determine the start state, the terminal states, and the valid transitions, but I could not find this. The code reveals that the SDKs differ on the fundamentals: Java InMemoryJobService: * start state (prepared): STOPPED * about to submit to executor: STARTING * actually running on executor: RUNNING * terminal states: DONE, FAILED, CANCELLED, DRAINED Python AbstractJobServiceServicer / LocalJobServicer: * start state: STARTING * terminal states: DONE, FAILED, CANCELLED, STOPPED I think it would be good to make python work like Java, so that there is a difference in state between a job that has been prepared and one that has been run. It's hard to tell how far this problem has spread within the various runners. I think a simple thing that can be done to help standardize behavior is to implement the terminal states as an enum in the beam_job_api.proto, or create a utility function in each language for checking if a state is terminal, so that it's not left up to each runner to reimplement this logic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964406#comment-16964406 ] Chad Dombrova commented on BEAM-8523: - WIP: https://github.com/apache/beam/pull/9959 > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964223#comment-16964223 ] Chad Dombrova edited comment on BEAM-8523 at 10/31/19 5:22 PM: --- Looking at this a little more, it seems it would be pretty easy to stream out the past state history from GetMessageStream after the initial request is made. This would make it easier on clients, since they would not have to concern themselves about ordering and race conditions between calls to a hypothetical GetMessageHistory and GetMessageStream (i.e what happens if a new event arrives between calls? If GetMessageHistory is called first we get the new event in both results, in the other order we miss the event). For tracking the state transition events on the JobInvocation, what's the preferred object for 2-tuples? I noticed javafx.Pair is not used anywhere in the Beam code. Should I use beam.sdk.values.KV? was (Author: chadrik): Looking at this a little more, it seems it would be pretty easy to stream out the past state history from GetMessageStream after the connection is made. This would make it easier on clients, since they would not have to concern themselves about ordering and race conditions between calls to a hypothetical GetMessageHistory and GetMessageStream (i.e what happens if a new event arrives between calls? If GetMessageHistory is called first we get the new event in both results, in the other order we miss the event). For tracking the state transition events on the JobInvocation, what's the preferred object for 2-tuples? I noticed javafx.Pair is not used anywhere in the Beam code. Should I use beam.sdk.values.KV? > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964223#comment-16964223 ] Chad Dombrova commented on BEAM-8523: - Looking at this a little more, it seems it would be pretty easy to stream out the past state history from GetMessageStream after the connection is made. This would make it easier on clients, since they would not have to concern themselves about ordering and race conditions between calls to a hypothetical GetMessageHistory and GetMessageStream (i.e what happens if a new event arrives between calls? If GetMessageHistory is called first we get the new event in both results, in the other order we miss the event). For tracking the state transition events on the JobInvocation, what's the preferred object for 2-tuples? I noticed javafx.Pair is not used anywhere in the Beam code. Should I use beam.sdk.values.KV? > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963645#comment-16963645 ] Chad Dombrova commented on BEAM-8523: - {quote} Seems to make sense but we might as well just have all the job states have timestamps and just have a list of state transitions returned as part of the job info. The last element within the list would be the "latest" state. {quote} That's an interesting idea. In that case, I think it might make sense to have a GetStateHistory method. This could be a pattern that we extend to GetMessageStream as well. We're currently working on creating a beam log service, e.g. backed by StackDriver, so GetMessageHistory could fit in really well there. That makes me wonder, is there an established gRPC pattern for getting stream history? For example, is there a way for a server to automatically backfill the missed messages when a late stream connection is established? Or is it better to keep these as two separate endpoints? > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963522#comment-16963522 ] Chad Dombrova commented on BEAM-8523: - [~lcwik] does this change sound reasonable to you? > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8523: Description: As a user querying jobs with JobService.GetJobs, it would be useful if the JobInfo result contained timestamps indicating various state changes that may have been missed by a client. Useful timestamps include: * submitted (prepared to the job service) * started (executor enters the RUNNING state) * completed (executor enters a terminal state) was: As a user querying jobs with JobService.GetJobs, it would be useful if the JobInfo result contained timestamps indicating various state changes that may have been missed by a client. Useful even timestamps include: * submitted (prepared to the job service) * started (executor enters the RUNNING state) * completed (executor enters a terminal state) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8523) Add useful timestamp to job servicer GetJobs
Chad Dombrova created BEAM-8523: --- Summary: Add useful timestamp to job servicer GetJobs Key: BEAM-8523 URL: https://issues.apache.org/jira/browse/BEAM-8523 Project: Beam Issue Type: New Feature Components: beam-model Reporter: Chad Dombrova Assignee: Chad Dombrova As a user querying jobs with JobService.GetJobs, it would be useful if the JobInfo result contained timestamps indicating various state changes that may have been missed by a client. Useful even timestamps include: * submitted (prepared to the job service) * started (executor enters the RUNNING state) * completed (executor enters a terminal state) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8411) Re-enable pylint checks disabled due to pylint bug
[ https://issues.apache.org/jira/browse/BEAM-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963296#comment-16963296 ] Chad Dombrova commented on BEAM-8411: - No, this was introduced with 2.4.2. we need to switch to 2.4.3+ > Re-enable pylint checks disabled due to pylint bug > -- > > Key: BEAM-8411 > URL: https://issues.apache.org/jira/browse/BEAM-8411 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Chad Dombrova >Priority: Minor > Labels: beginner > > The pylint github issue is here: https://github.com/PyCQA/pylint/issues/3152 > Once that's fixed, remove the exclusions in setup.py and > apache_beam/examples/complete/juliaset/setup.py -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8486) Reference to ParallelBundleManager attr "_skip_registration" should be "_registered"
Chad Dombrova created BEAM-8486: --- Summary: Reference to ParallelBundleManager attr "_skip_registration" should be "_registered" Key: BEAM-8486 URL: https://issues.apache.org/jira/browse/BEAM-8486 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Chad Dombrova It appears there was a mistake in the multiprocess refactor of the BundleManager. I came across this issue while getting mypy type analysis setup (score one for static type checking!) The offending line is here: https://github.com/apache/beam/blob/8d2997f8d7ad84649b8ecb2f7e2ca2eceb91b6d0/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L933 I think it should be using "_registered", as seen here: https://github.com/apache/beam/blob/8d2997f8d7ad84649b8ecb2f7e2ca2eceb91b6d0/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1909 I can easily change the attribute name, but presumably this will change the behavior. Hopefully for the better, but I don't know enough to say. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959135#comment-16959135 ] Chad Dombrova commented on BEAM-7870: - The plan forming is to use DefaultSchema with either JavaBeamSchema or JavaFieldSchema, whichever works with fewer changes to PubsubMessage and KafkaRecord. Then on the python side, we do something like this: {code:python} # this is py3 syntax for clarity, but we'd probably # need to use the TypedDict('PubsubMessage', ...) version class PubsubMessage(TypedDict): message: ByteString attributes: Mapping[unicode, unicode] messageId: unicode coders.registry.register_coder(PubsubMessage, coders.RowCoder) pcoll | 'make some messages' >> beam.Map(makeMessage).with_output_types(PubsubMessage) | 'write to pubsub' >> beam.io.WriteToPubsub(project, topic) # or something {code} > Externally configured KafkaIO / PubsubIO consumer causes coder problems > --- > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7870) Externally configured KafkaIO / PubsubIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-7870: Summary: Externally configured KafkaIO / PubsubIO consumer causes coder problems (was: Externally configured KafkaIO consumer causes coder problems) > Externally configured KafkaIO / PubsubIO consumer causes coder problems > --- > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8350) Upgrade to pylint 2.4
[ https://issues.apache.org/jira/browse/BEAM-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958027#comment-16958027 ] Chad Dombrova commented on BEAM-8350: - done! > Upgrade to pylint 2.4 > - > > Key: BEAM-8350 > URL: https://issues.apache.org/jira/browse/BEAM-8350 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > pylint 2.4 provides a number of new features and fixes, but the most > important/pressing one for me is that 2.4 adds support for understanding > python type annotations, which fixes a bunch of spurious unused import errors > in the PR I'm working on for BEAM-7746. > As of 2.0, pylint dropped support for running tests in python2, so to make > the upgrade we have to move our lint jobs to python3. Doing so will put > pylint into "python3-mode" and there is not an option to run in > python2-compatible mode. That said, the beam code is intended to be python3 > compatible, so in practice, performing a python3 lint on the Beam code-base > is perfectly safe. The primary risk of doing this is that someone introduces > a python-3 only change that breaks python2, but these would largely be syntax > errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8411) Re-enable pylint checks disabled due to pylint bug
Chad Dombrova created BEAM-8411: --- Summary: Re-enable pylint checks disabled due to pylint bug Key: BEAM-8411 URL: https://issues.apache.org/jira/browse/BEAM-8411 Project: Beam Issue Type: Task Components: sdk-py-core Reporter: Chad Dombrova The pylint github issue is here: https://github.com/PyCQA/pylint/issues/3152 Once that's fixed, remove the exclusions in setup.py and apache_beam/examples/complete/juliaset/setup.py -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8172) Add a label field to FunctionSpec proto
[ https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952312#comment-16952312 ] Chad Dombrova commented on BEAM-8172: - The goal of clarifying intent is a fine one, but I'm concerned that "debug_string" could result in developers using this field for large text blocks like stack traces, when the intent is a brief descriptor, like a human-readable ID. other ideas: hint, descriptor, debug_label, debug_identifier > Add a label field to FunctionSpec proto > --- > > Key: BEAM-8172 > URL: https://issues.apache.org/jira/browse/BEAM-8172 > Project: Beam > Issue Type: Improvement > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > > The FunctionSpec payload is opaque outside of the native environment, which > can make debugging pipeline protos difficult. It would be very useful if the > FunctionSpec had an optional human readable "label", for debugging and > presenting in UIs and error messages. > For example, in python, if the payload is an instance of a class, we could > attempt to provide a string that represents the dotted path to that class, > "mymodule.MyClass". In the case of coders, we could use the label to hold > the type hint: "Optional[Iterable[mymodule.MyClass]]". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8402) Create a class hierarchy to represent environments
Chad Dombrova created BEAM-8402: --- Summary: Create a class hierarchy to represent environments Key: BEAM-8402 URL: https://issues.apache.org/jira/browse/BEAM-8402 Project: Beam Issue Type: New Feature Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova As a first step towards making it possible to assign different environments to sections of a pipeline, we first need to expose environment classes to the pipeline API. Unlike PTransforms, PCollections, Coders, and Windowings, environments exists solely in the portability framework as protobuf objects. By creating a hierarchy of "native" classes that represent the various environment types -- external, docker, process, etc -- users will be able to instantiate these and assign them to parts of the pipeline. The assignment portion will be covered in a follow-up issue/PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8355) Make BooleanCoder a standard coder
Chad Dombrova created BEAM-8355: --- Summary: Make BooleanCoder a standard coder Key: BEAM-8355 URL: https://issues.apache.org/jira/browse/BEAM-8355 Project: Beam Issue Type: New Feature Components: beam-model, sdk-java-core, sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova This involves making the current java BooleanCoder a standard coder, and implementing an equivalent coder in python -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8350) Upgrade to pylint 2.4
Chad Dombrova created BEAM-8350: --- Summary: Upgrade to pylint 2.4 Key: BEAM-8350 URL: https://issues.apache.org/jira/browse/BEAM-8350 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova pylint 2.4 provides a number of new features and fixes, but the most important/pressing one for me is that 2.4 adds support for understanding python type annotations, which fixes a bunch of spurious unused import errors in the PR I'm working on for BEAM-7746. As of 2.0, pylint dropped support for running tests in python2, so to make the upgrade we have to move our lint jobs to python3. Doing so will put pylint into "python3-mode" and there is not an option to run in python2-compatible mode. That said, the beam code is intended to be python3 compatible, so in practice, performing a python3 lint on the Beam code-base is perfectly safe. The primary risk of doing this is that someone introduces a python-3 only change that breaks python2, but these would largely be syntax errors that would be immediately caught by the unit and integration tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7768) Formalize Python external transform API
[ https://issues.apache.org/jira/browse/BEAM-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932021#comment-16932021 ] Chad Dombrova commented on BEAM-7768: - This is resolved with https://github.com/apache/beam/pull/9098 > Formalize Python external transform API > --- > > Key: BEAM-7768 > URL: https://issues.apache.org/jira/browse/BEAM-7768 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Priority: Major > > As a python developer I'd like to have a more opinionated API for external > transforms to A) guide me toward more standardized solutions and B) reduce > the amount of boilerplate code that needs to be written. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932020#comment-16932020 ] Chad Dombrova commented on BEAM-7984: - this is resolved > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8238) Improve python gen_protos script
[ https://issues.apache.org/jira/browse/BEAM-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932018#comment-16932018 ] Chad Dombrova commented on BEAM-8238: - this is resolved > Improve python gen_protos script > > > Key: BEAM-8238 > URL: https://issues.apache.org/jira/browse/BEAM-8238 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Issues that need to be solved: > - doesn't print output to shell when run during `python setup.py` > - runs futurize twice when grcpio-tools is not present and needs to be > installed > - does not account for the case where there are more output files than .proto > files (just keeps triggering out-of-date rebuilds due to timestamp of > orphaned output file) > - plus it would be nice for debugging if the script printed why it is > regenerating pb2 files, because there are a number of possible reasons -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8148) [python] allow tests to be specified when using tox
[ https://issues.apache.org/jira/browse/BEAM-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932019#comment-16932019 ] Chad Dombrova commented on BEAM-8148: - this is resolved > [python] allow tests to be specified when using tox > --- > > Key: BEAM-8148 > URL: https://issues.apache.org/jira/browse/BEAM-8148 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > I don't know how to specify individual tests using gradle, and as a python > developer, it's way too opaque for me to figure out on my own. > The developer wiki suggest calling the tests directly: > {noformat} > python setup.py nosetests --tests :. > {noformat} > But this defeats the purpose of using tox, which is there to help users > manage the myriad virtual envs required to run the different tests. > Luckily there is an easier way! You just add {posargs} to the tox commands. > PR is coming shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8271) StateGetRequest/Response continuation_token should be string
Chad Dombrova created BEAM-8271: --- Summary: StateGetRequest/Response continuation_token should be string Key: BEAM-8271 URL: https://issues.apache.org/jira/browse/BEAM-8271 Project: Beam Issue Type: Improvement Components: beam-model Reporter: Chad Dombrova Assignee: Chad Dombrova I've been working on adding typing to the python code and I came across a discrepancy between regarding the type of the continuation token. The .proto defines it as bytes, but the code treats it as a string (i.e. unicode): {code:java} // A request to get state. message StateGetRequest { // (Optional) If specified, signals to the runner that the response // should resume from the following continuation token. // // If unspecified, signals to the runner that the response should start // from the beginning of the logical continuable stream. bytes continuation_token = 1; } // A response to get state representing a logical byte stream which can be // continued using the state API. message StateGetResponse { // (Optional) If specified, represents a token which can be used with the // state API to get the next chunk of this logical byte stream. The end of // the logical byte stream is signalled by this field being unset. bytes continuation_token = 1; // Represents a part of a logical byte stream. Elements within // the logical byte stream are encoded in the nested context and // concatenated together. bytes data = 2; } {code} >From FnApiRunner.StateServicer: {code:python} def blocking_get(self, state_key, continuation_token=None): with self._lock: full_state = self._state[self._to_key(state_key)] if self._use_continuation_tokens: # The token is "nonce:index". if not continuation_token: token_base = 'token_%x' % len(self._continuations) self._continuations[token_base] = tuple(full_state) return b'', '%s:0' % token_base else: token_base, index = continuation_token.split(':') ix = int(index) full_state = self._continuations[token_base] if ix == len(full_state): return b'', None else: return full_state[ix], '%s:%d' % (token_base, ix + 1) else: assert not continuation_token return b''.join(full_state), None {code} This could be a problem in python3. All other id values are string, whereas bytes is reserved for data, so I think that the proto should be changed to string. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930767#comment-16930767 ] Chad Dombrova commented on BEAM-8213: - I should also mention I'm fine with lumping the IT jobs in with their respective preCommitPy* jobs, if we want to reduce the total number of jenkins jobs. So the final list of jobs/tasks would be: {code} pythonPreCommit() dependsOn ":sdks:python:test-suites:tox:py2:preCommitPy2" dependsOn "testPy2Cython" dependsOn "testPython2" dependsOn "testPy2Gcp" dependsOn ":sdks:python:test-suites:dataflow:py2:preCommitIT" dependsOn ":sdks:python:test-suites:tox:py35:preCommitPy35" dependsOn "testPython35" dependsOn "testPy35Gcp" dependsOn "testPy35Cython" dependsOn ":sdks:python:test-suites:tox:py36:preCommitPy36" dependsOn "testPython36" dependsOn "testPy36Gcp" dependsOn "testPy36Cython" dependsOn ":sdks:python:test-suites:tox:py37:preCommitPy37" dependsOn "testPython37" dependsOn "testPy37Gcp" dependsOn "testPy37Cython" dependsOn ":sdks:python:test-suites:dataflow:py37:preCommitIT" dependsOn ":sdks:python:test-suites:tox:preCommitLint" dependsOn ":sdks:python:test-suites:tox:py2:lint" dependsOn ":sdks:python:test-suites:tox:py2:docs" dependsOn ":sdks:python:test-suites:tox:py35:lint" {code} > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930762#comment-16930762 ] Chad Dombrova commented on BEAM-8213: - Here's a hierarchy of the relevant pre-commit tasks: {code} pythonPreCommit() dependsOn ":sdks:python:test-suites:tox:py2:preCommitPy2" dependsOn "docs" dependsOn "testPy2Cython" dependsOn "testPython2" dependsOn "testPy2Gcp" dependsOn "lint" dependsOn ":sdks:python:test-suites:tox:py35:preCommitPy35" dependsOn "testPython35" dependsOn "testPy35Gcp" dependsOn "testPy35Cython" dependsOn "lint" dependsOn ":sdks:python:test-suites:tox:py36:preCommitPy36" dependsOn "testPython36" dependsOn "testPy36Gcp" dependsOn "testPy36Cython" dependsOn ":sdks:python:test-suites:tox:py37:preCommitPy37" dependsOn "testPython37" dependsOn "testPy37Gcp" dependsOn "testPy37Cython" dependsOn ":sdks:python:test-suites:dataflow:py2:preCommitIT" dependsOn ":sdks:python:test-suites:dataflow:py37:preCommitIT" {code} I think an easy place to start is to split up pythonPreCommit into 6 separate jobs - preCommitPy2 - preCommitPy35 - preCommitPy36 - preCommitPy37 - preCommitITPy2 - preCommitITPy37 Then we pull out all the lint and doc tasks into another task: - preCommitPyLint It's a lot of jobs, but each one will finish faster, and the output for each will be much more clear. > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single error pertains to, then I need to > repeat for each failure. This makes it very difficult to discover problems, > and deduce that they may have something to do with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job into sub-jobs (possibly using a pipeline with steps). This > would give us the following benefits: > - sub job results should become available as they finish, so for example, > lint results should be available very early on > - sub job results will be reported separately, and there will be a job for > each py2, py35, py36 and so on, so it will be clear when an error is related > to a particular python version > - sub jobs without reports, like docs and lint, will have their own failure > status and logs, so when they fail it will be more obvious what went wrong. > I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8235) Python Post Commit 3.6 issue
[ https://issues.apache.org/jira/browse/BEAM-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930747#comment-16930747 ] Chad Dombrova commented on BEAM-8235: - this is resolved > Python Post Commit 3.6 issue > > > Key: BEAM-8235 > URL: https://issues.apache.org/jira/browse/BEAM-8235 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Chad Dombrova >Priority: Major > > https://builds.apache.org/job/beam_PostCommit_Python36/469/console > 12:04:37 > == > 12:04:37 ERROR: Failure: ModuleNotFoundError (No module named 'dataclasses') > 12:04:37 > -- > 12:04:37 Traceback (most recent call last): > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/failure.py", > line 39, in runTest > 12:04:37 raise self.exc_val.with_traceback(self.tb) > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/loader.py", > line 418, in loadTestsFromName > 12:04:37 addr.filename, addr.module) > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", > line 47, in importFromPath > 12:04:37 return self.importFromDir(dir_path, fqname) > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/build/gradleenv/-1734967053/lib/python3.6/site-packages/nose/importer.py", > line 94, in importFromDir > 12:04:37 mod = load_module(part_fqname, fh, filename, desc) > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", > line 235, in load_module > 12:04:37 return load_source(name, filename, file) > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/build/gradleenv/-1734967053/lib/python3.6/imp.py", > line 172, in load_source > 12:04:37 module = _load(spec) > 12:04:37 File "", line 684, in _load > 12:04:37 File "", line 665, in _load_unlocked > 12:04:37 File "", line 678, in > exec_module > 12:04:37 File "", line 219, in > _call_with_frames_removed > 12:04:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python36/src/sdks/python/apache_beam/transforms/external_test_py37.py", > line 22, in > 12:04:37 import dataclasses > I am guessing that https://github.com/apache/beam/pull/9570 will suppress > this. However there might be additional changes needed after > https://github.com/apache/beam/pull/9098 for python 3.6. > /cc [~tvalentyn] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8229) Python postcommit 2.7 tests are failing since we try to run a test with Python 3.6+ syntax on Python 2.
[ https://issues.apache.org/jira/browse/BEAM-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930749#comment-16930749 ] Chad Dombrova commented on BEAM-8229: - this is resolved > Python postcommit 2.7 tests are failing since we try to run a test with > Python 3.6+ syntax on Python 2. > --- > > Key: BEAM-8229 > URL: https://issues.apache.org/jira/browse/BEAM-8229 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1h 50m > Remaining Estimate: 0h > > Root cause: https://github.com/apache/beam/pull/9098 > 20:12:14 Traceback (most recent call last): > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/loader.py", > line 418, in loadTestsFromName > 20:12:14 addr.filename, addr.module) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 47, in importFromPath > 20:12:14 return self.importFromDir(dir_path, fqname) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 94, in importFromDir > 20:12:14 mod = load_module(part_fqname, fh, filename, desc) > 20:12:14 SyntaxError: invalid syntax (external_test_py37.py, line 46) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8238) Improve python gen_protos script
Chad Dombrova created BEAM-8238: --- Summary: Improve python gen_protos script Key: BEAM-8238 URL: https://issues.apache.org/jira/browse/BEAM-8238 Project: Beam Issue Type: Improvement Components: build-system, sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova Issues that need to be solved: - doesn't print output to shell when run during `python setup.py` - runs futurize twice when grcpio-tools is not present and needs to be installed - does not account for the case where there are more output files than .proto files (just keeps triggering out-of-date rebuilds due to timestamp of orphaned output file) - plus it would be nice for debugging if the script printed why it is regenerating pb2 files, because there are a number of possible reasons -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8213: Description: As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the test results summary, so even though the job is marked as failed, I see "Test Result (no failures)" which is quite confusing - I have to wait for over an hour to discover the lint failed, which takes about a minute to run on its own - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of which version of python they use. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. It takes 5 clicks and then reading the log output to know which version of python a single error pertains to, then I need to repeat for each failure. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub job results should become available as they finish, so for example, lint results should be available very early on - sub job results will be reported separately, and there will be a job for each py2, py35, py36 and so on, so it will be clear when an error is related to a particular python version - sub jobs without reports, like docs and lint, will have their own failure status and logs, so when they fail it will be more obvious what went wrong. I'm happy to help out once I get some feedback on the desired way forward. was: As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the test results summary, so even though the job is marked as failed, I see "Test Result (no failures)" which is quite confusing - I have to wait for over an hour to discover the lint failed, which takes about a minute to run on its own - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of which version of python they use. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. It takes 5 clicks and then reading the log output to know which version of python a single error pertains to. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub job results should become available as they finish, so for example, lint results should be available very early on - sub job results will be reported separately, and there will be a job for each py2, py35, py36 and so on, so it will be clear when an error is related to a particular python version - sub jobs without reports, like docs and lint, will have their own failure status and logs, so when they fail it will be more obvious what went wrong. I'm happy to help out once I get some feedback on the desired way forward. > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log
[jira] [Updated] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8213: Description: As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the test results summary, so even though the job is marked as failed, I see "Test Result (no failures)" which is quite confusing - I have to wait for over an hour to discover the lint failed, which takes about a minute to run on its own - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of which version of python they use. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. It takes 5 clicks and then reading the log output to know which version of python a single error pertains to. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub job results should become available as they finish, so for example, lint results should be available very early on - sub job results will be reported separately, and there will be a job for each py2, py35, py36 and so on, so it will be clear when an error is related to a particular python version - sub jobs without reports, like docs and lint, will have their own failure status and logs, so when they fail it will be more obvious what went wrong. I'm happy to help out once I get some feedback on the desired way forward. was: As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the test results summary, so even though the job is marked as failed, I see "Test Result (no failures)" which is quite confusing - I have to wait for over an hour to discover the lint failed, which takes about a minute to run on its own - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of which version of python they use. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. It takes 5 clicks and then reading the log output to know which version a single error pertains to. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub job results should become available as they finish, so for example, lint results should be available very early on - sub job results will be reported separately, and there will be a job for each py2, py35, py36 and so on, so it will be clear when an error is related to a particular python version - sub jobs without reports, like docs and lint, will have their own failure status and logs, so when they fail it will be more obvious what went wrong. I'm happy to help out once I get some feedback on the desired way forward. > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version of python a single
[jira] [Updated] (BEAM-8213) Run and report python tox tasks separately within Jenkins
[ https://issues.apache.org/jira/browse/BEAM-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8213: Description: As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the test results summary, so even though the job is marked as failed, I see "Test Result (no failures)" which is quite confusing - I have to wait for over an hour to discover the lint failed, which takes about a minute to run on its own - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of which version of python they use. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. It takes 5 clicks and then reading the log output to know which version a single error pertains to. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub job results should become available as they finish, so for example, lint results should be available very early on - sub job results will be reported separately, and there will be a job for each py2, py35, py36 and so on, so it will be clear when an error is related to a particular python version - sub jobs without reports, like docs and lint, will have their own failure status and logs, so when they fail it will be more obvious what went wrong. I'm happy to help out once I get some feedback on the desired way forward. was: As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the Test results, so even though the job failed, I see "Test Result (no failures)" - I have to wait for over an hour to discover the lint failed, which takes about a minute to run - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of whether they come from python 27, 35, etc. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub-job results should become available as they finish, so lint results should be available very early on - sub-job results will be reported separately, so it will be clear when an error is related to a particular python version - sub-jobs without reports, like docs and lint, will have their own failure status and logs, so it will be more obvious when they fail what went wrong. I'm happy to help out once I get some feedback on the desired way forward. > Run and report python tox tasks separately within Jenkins > - > > Key: BEAM-8213 > URL: https://issues.apache.org/jira/browse/BEAM-8213 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Chad Dombrova >Priority: Major > > As a python developer, the speed and comprehensibility of the jenkins > PreCommit job could be greatly improved. > Here are some of the problems > - when a lint job fails, it's not reported in the test results summary, so > even though the job is marked as failed, I see "Test Result (no failures)" > which is quite confusing > - I have to wait for over an hour to discover the lint failed, which takes > about a minute to run on its own > - The logs are a jumbled mess of all the different tasks running on top of > each other > - The test results give no indication of which version of python they use. I > click on Test results, then the test module, then the test class, then I see > 4 tests named the same thing. I assume that the first is python 2.7, the > second is 3.5 and so on. It takes 5 clicks and then reading the log output > to know which version a single error pertains to. This makes it very > difficult to discover problems, and deduce that they may have something to do > with python version mismatches. > I believe the solution to this is to split up the single monolithic python > PreCommit job
[jira] [Commented] (BEAM-8229) Python postcommit 2.7 tests are failing since we try to run a test with Python 3.6+ syntax on Python 2.
[ https://issues.apache.org/jira/browse/BEAM-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929412#comment-16929412 ] Chad Dombrova commented on BEAM-8229: - Based on feedback from Robert, I'm going to split my PR into two: the first which excludes the python3.6-only feature, and the other which includes it and the additional test filtering features. > Python postcommit 2.7 tests are failing since we try to run a test with > Python 3.6+ syntax on Python 2. > --- > > Key: BEAM-8229 > URL: https://issues.apache.org/jira/browse/BEAM-8229 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > > Root cause: https://github.com/apache/beam/pull/9098 > 20:12:14 Traceback (most recent call last): > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/loader.py", > line 418, in loadTestsFromName > 20:12:14 addr.filename, addr.module) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 47, in importFromPath > 20:12:14 return self.importFromDir(dir_path, fqname) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 94, in importFromDir > 20:12:14 mod = load_module(part_fqname, fh, filename, desc) > 20:12:14 SyntaxError: invalid syntax (external_test_py37.py, line 46) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8229) Python postcommit 2.7 tests are failing since we try to run a test with Python 3.6+ syntax on Python 2.
[ https://issues.apache.org/jira/browse/BEAM-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929363#comment-16929363 ] Chad Dombrova commented on BEAM-8229: - {quote} I am not familiar with pytest, but suspected it may have some solutions for this problem. I thought that you might know if this is the case. {quote} I've used pytest quite a bit and I like it better than nose. With pytest you can create a conftest.py file which can setup exclusions programmatically (e.g. based on the current sys.executable), which would help consolidate the exclusions by moving them from the nosetests cli args currently in various shell scripts and tox.ini, into the test code itself. It's possible to do this with nose as well, but IIRC it takes a bit more boilerplate. > Python postcommit 2.7 tests are failing since we try to run a test with > Python 3.6+ syntax on Python 2. > --- > > Key: BEAM-8229 > URL: https://issues.apache.org/jira/browse/BEAM-8229 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > > Root cause: https://github.com/apache/beam/pull/9098 > 20:12:14 Traceback (most recent call last): > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/loader.py", > line 418, in loadTestsFromName > 20:12:14 addr.filename, addr.module) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 47, in importFromPath > 20:12:14 return self.importFromDir(dir_path, fqname) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 94, in importFromDir > 20:12:14 mod = load_module(part_fqname, fh, filename, desc) > 20:12:14 SyntaxError: invalid syntax (external_test_py37.py, line 46) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8229) Python postcommit 2.7 tests are failing since we try to run a test with Python 3.6+ syntax on Python 2.
[ https://issues.apache.org/jira/browse/BEAM-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929360#comment-16929360 ] Chad Dombrova commented on BEAM-8229: - Back with more insight. The integration tests don't go through tox, which means logic regarding viritualenv setup and test exclusion is spread around the project a bit. Outside of tox.ini test exclusion is performed in build.gradle, run_integration_tests.sh, run_validatescontainer.sh. Currently it looks like the integration tests don't run the python3-specific tests at all: Here's run_integration_tests.sh: {code:java} python setup.py nosetests \ --test-pipeline-options="$PIPELINE_OPTS" \ --with-xunitmp --xunitmp-file=$XUNIT_FILE \ --ignore-files '.*py3.py$' \ $TEST_OPTS {code} Since this is already a bit of a sub-par situation (not running the py3-specific tests), I think the most reasonable solution is to extend this regex to py3[0-9]*, which will exclude my py37 tests as well. (Also, note that since this is a regex and not a glob, the period should have been escaped: as it is it means "any character"). I think the ideal long term solution is to make all tests pass through tox, which will increase their discoverability, runability, and standardization. > Python postcommit 2.7 tests are failing since we try to run a test with > Python 3.6+ syntax on Python 2. > --- > > Key: BEAM-8229 > URL: https://issues.apache.org/jira/browse/BEAM-8229 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > > Root cause: https://github.com/apache/beam/pull/9098 > 20:12:14 Traceback (most recent call last): > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/loader.py", > line 418, in loadTestsFromName > 20:12:14 addr.filename, addr.module) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 47, in importFromPath > 20:12:14 return self.importFromDir(dir_path, fqname) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 94, in importFromDir > 20:12:14 mod = load_module(part_fqname, fh, filename, desc) > 20:12:14 SyntaxError: invalid syntax (external_test_py37.py, line 46) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8229) Python postcommit 3.7 tests are failing with syntax error
[ https://issues.apache.org/jira/browse/BEAM-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929346#comment-16929346 ] Chad Dombrova commented on BEAM-8229: - This is getting a bit difficult to communicate as the conversation is spread across 2 JIRA tickets and 2 github PRs. Reposting this from the others: Pre-commit does run the python 3.x tests, but it runs many of the tests on python 3.5. This PR introduced tests that can only run on python 3.6 or higher, and I had to do some extra work to the beam test framework to make that possible: https://github.com/apache/beam/pull/9098/commits/0c31f7c8b1c108f7df4cbedfaf9da0ce30092ce0 It appears that same work to properly exclude/include tests at the major + minor version needs to be done for post-commit tests. I didn't do that simply because I didn't know it could be a problem. I assumed everything went through tox. I'm currently looking for the post-commit entry point to see what else needs to be filtered. Note that the _other_ solution to this is dropping python 3.5 support, but I figured that'd be a much bigger conversation. > Python postcommit 3.7 tests are failing with syntax error > - > > Key: BEAM-8229 > URL: https://issues.apache.org/jira/browse/BEAM-8229 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > > Root cause: https://github.com/apache/beam/pull/9098 > 20:12:14 Traceback (most recent call last): > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/loader.py", > line 418, in loadTestsFromName > 20:12:14 addr.filename, addr.module) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 47, in importFromPath > 20:12:14 return self.importFromDir(dir_path, fqname) > 20:12:14 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python2_PR/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/nose/importer.py", > line 94, in importFromDir > 20:12:14 mod = load_module(part_fqname, fh, filename, desc) > 20:12:14 SyntaxError: invalid syntax (external_test_py37.py, line 46) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8230) Python pre-commits need to include python 3.x tests
[ https://issues.apache.org/jira/browse/BEAM-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929335#comment-16929335 ] Chad Dombrova commented on BEAM-8230: - Reposting this from the PR: There's a misunderstanding here. Pre-commit _does_ run the python 3.x tests, but it runs many of the tests on python 3.5. This PR introduced tests that can only run on python 3.6 or higher, and I had to do some extra work to the beam test framework to make that possible: https://github.com/apache/beam/pull/9098/commits/0c31f7c8b1c108f7df4cbedfaf9da0ce30092ce0 I appears that same work to properly exclude/include tests at the major + minor version needs to be done for post-commit tests. I didn't do that simply because I didn't know it could be a problem. I assumed everything went through tox. > Python pre-commits need to include python 3.x tests > --- > > Key: BEAM-8230 > URL: https://issues.apache.org/jira/browse/BEAM-8230 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core, testing >Reporter: Ahmet Altay >Assignee: Valentyn Tymofieiev >Priority: Major > > Python pre-coomit does not seem to run python 3.x test. This results in post > commit tests catching simple issues like syntax error. > Example: https://issues.apache.org/jira/browse/BEAM-8229 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8213) Run and report python tox tasks separately within Jenkins
Chad Dombrova created BEAM-8213: --- Summary: Run and report python tox tasks separately within Jenkins Key: BEAM-8213 URL: https://issues.apache.org/jira/browse/BEAM-8213 Project: Beam Issue Type: Improvement Components: build-system Reporter: Chad Dombrova As a python developer, the speed and comprehensibility of the jenkins PreCommit job could be greatly improved. Here are some of the problems - when a lint job fails, it's not reported in the Test results, so even though the job failed, I see "Test Result (no failures)" - I have to wait for over an hour to discover the lint failed, which takes about a minute to run - The logs are a jumbled mess of all the different tasks running on top of each other - The test results give no indication of whether they come from python 27, 35, etc. I click on Test results, then the test module, then the test class, then I see 4 tests named the same thing. I assume that the first is python 2.7, the second is 3.5 and so on. This makes it very difficult to discover problems, and deduce that they may have something to do with python version mismatches. I believe the solution to this is to split up the single monolithic python PreCommit job into sub-jobs (possibly using a pipeline with steps). This would give us the following benefits: - sub-job results should become available as they finish, so lint results should be available very early on - sub-job results will be reported separately, so it will be clear when an error is related to a particular python version - sub-jobs without reports, like docs and lint, will have their own failure status and logs, so it will be more obvious when they fail what went wrong. I'm happy to help out once I get some feedback on the desired way forward. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8172) Add a label field to FunctionSpec proto
Chad Dombrova created BEAM-8172: --- Summary: Add a label field to FunctionSpec proto Key: BEAM-8172 URL: https://issues.apache.org/jira/browse/BEAM-8172 Project: Beam Issue Type: Improvement Components: beam-model Reporter: Chad Dombrova Assignee: Chad Dombrova The FunctionSpec payload is opaque outside of the native environment, which can make debugging pipeline protos difficult. It would be very useful if the FunctionSpec had an optional human readable "label", for debugging and presenting in UIs and error messages. For example, in python, if the payload is an instance of a class, we could attempt to provide a string that represents the dotted path to that class, "mymodule.MyClass". In the case of coders, we could use the label to hold the type hint: "Optional[Iterable[mymodule.MyClass]]". -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8088) PCollection boundedness should be tracked and propagated in python sdk
[ https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8088: Summary: PCollection boundedness should be tracked and propagated in python sdk (was: [python] PCollection boundedness should be tracked and propagated) > PCollection boundedness should be tracked and propagated in python sdk > -- > > Key: BEAM-8088 > URL: https://issues.apache.org/jira/browse/BEAM-8088 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > As far as I can tell Python does not care about boundedness of PCollections > even in streaming mode, but external transforms _do_. In my ongoing effort > to get PubsubIO external transforms working I discovered that I could not > generate an unbounded write. > My pipeline looks like this: > {code:python} > ( > pipe > | 'PubSubInflow' >> > external.pubsub.ReadFromPubSub(subscription=subscription, > with_attributes=True) > | 'PubSubOutflow' >> > external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True) > ) > {code} > The PCollections returned from the external Read are Unbounded, as expected, > but python is responsible for creating the intermediate PCollection, which is > always Bounded, and thus external Write is always Bounded. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8088) [python] PCollection boundedness should be tracked and propagated
[ https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-8088: Summary: [python] PCollection boundedness should be tracked and propagated (was: PCollection boundedness should be tracked and propagated) > [python] PCollection boundedness should be tracked and propagated > - > > Key: BEAM-8088 > URL: https://issues.apache.org/jira/browse/BEAM-8088 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > As far as I can tell Python does not care about boundedness of PCollections > even in streaming mode, but external transforms _do_. In my ongoing effort > to get PubsubIO external transforms working I discovered that I could not > generate an unbounded write. > My pipeline looks like this: > {code:python} > ( > pipe > | 'PubSubInflow' >> > external.pubsub.ReadFromPubSub(subscription=subscription, > with_attributes=True) > | 'PubSubOutflow' >> > external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True) > ) > {code} > The PCollections returned from the external Read are Unbounded, as expected, > but python is responsible for creating the intermediate PCollection, which is > always Bounded, and thus external Write is always Bounded. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923739#comment-16923739 ] Chad Dombrova commented on BEAM-7060: - Note that work on type annotations is continuing in BEAM-7746 > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8148) [python] allow tests to be specified when using tox
[ https://issues.apache.org/jira/browse/BEAM-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923730#comment-16923730 ] Chad Dombrova commented on BEAM-8148: - PR is merged, so this is fixed! > [python] allow tests to be specified when using tox > --- > > Key: BEAM-8148 > URL: https://issues.apache.org/jira/browse/BEAM-8148 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > I don't know how to specify individual tests using gradle, and as a python > developer, it's way too opaque for me to figure out on my own. > The developer wiki suggest calling the tests directly: > {noformat} > python setup.py nosetests --tests :. > {noformat} > But this defeats the purpose of using tox, which is there to help users > manage the myriad virtual envs required to run the different tests. > Luckily there is an easier way! You just add {posargs} to the tox commands. > PR is coming shortly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8148) [python] allow tests to be specified when using tox
Chad Dombrova created BEAM-8148: --- Summary: [python] allow tests to be specified when using tox Key: BEAM-8148 URL: https://issues.apache.org/jira/browse/BEAM-8148 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova I don't know how to specify individual tests using gradle, and as a python developer, it's way too opaque for me to figure out on my own. The developer wiki suggest calling the tests directly: {noformat} python setup.py nosetests --tests :. {noformat} But this defeats the purpose of using tox, which is there to help users manage the myriad virtual envs required to run the different tests. Luckily there is an easier way! You just add {posargs} to the tox commands. PR is coming shortly. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8088) PCollection boundedness should be tracked and propagated
Chad Dombrova created BEAM-8088: --- Summary: PCollection boundedness should be tracked and propagated Key: BEAM-8088 URL: https://issues.apache.org/jira/browse/BEAM-8088 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova As far as I can tell Python does not care about boundedness of PCollections even in streaming mode, but external transforms _do_. In my ongoing effort to get PubsubIO external transforms working I discovered that I could not generate an unbounded write. My pipeline looks like this: {code:python} ( pipe | 'PubSubInflow' >> external.pubsub.ReadFromPubSub(subscription=subscription, with_attributes=True) | 'PubSubOutflow' >> external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True) ) {code} The PCollections returned from the external Read are Unbounded, as expected, but python is responsible for creating the intermediate PCollection, which is always Bounded, and thus external Write is always Bounded. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8085) PubsubMessageWithAttributesCoder should not produce null attributes map
Chad Dombrova created BEAM-8085: --- Summary: PubsubMessageWithAttributesCoder should not produce null attributes map Key: BEAM-8085 URL: https://issues.apache.org/jira/browse/BEAM-8085 Project: Beam Issue Type: Improvement Components: io-java-gcp Reporter: Chad Dombrova Hi, I just got caught by an issue where PubsubMessage.getAttributeMap() returned null, because the message was created by PubsubMessageWithAttributesCoder which uses a NullableCoder for attributes. Here are the relevant code snippets: {code:java} public class PubsubMessageWithAttributesCoder extends CustomCoder { // A message's payload can not be null private static final Coder PAYLOAD_CODER = ByteArrayCoder.of(); // A message's attributes can be null. private static final Coder> ATTRIBUTES_CODER = NullableCoder.of(MapCoder.of(StringUtf8Coder.of(), StringUtf8Coder.of())); @Override public PubsubMessage decode(InputStream inStream) throws IOException { return decode(inStream, Context.NESTED); } @Override public PubsubMessage decode(InputStream inStream, Context context) throws IOException { byte[] payload = PAYLOAD_CODER.decode(inStream); Map attributes = ATTRIBUTES_CODER.decode(inStream, context); return new PubsubMessage(payload, attributes); } } {code} {code:java} public class PubsubMessage { private byte[] message; private Map attributes; public PubsubMessage(byte[] payload, Map attributes) { this.message = payload; this.attributes = attributes; } /** Returns the main PubSub message. */ public byte[] getPayload() { return message; } /** Returns the given attribute value. If not such attribute exists, returns null. */ @Nullable public String getAttribute(String attribute) { checkNotNull(attribute, "attribute"); return attributes.get(attribute); } /** Returns the full map of attributes. This is an unmodifiable map. */ public Map getAttributeMap() { return attributes; } } {code} There are a handful of potential solutions: # Remove the NullableCoder # In PubsubMessageWithAttributesCoder.decode, check for null and create an empty Map before instantiating PubsubMessage # Allow attributes to be null for PubsubMessage constructor, but create an empty Map if it is (similar to above, but handle it in PubsubMessage) # Allow PubsubMessage.attributes to be nullable, and indicate it as such -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7870) Externally configured KafkaIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912720#comment-16912720 ] Chad Dombrova commented on BEAM-7870: - Another possible solution: replace PubsubMessage & KafkaRecord with sdk.values.Row, so that they are compatible with the new portable RowCoder feature being added in BEAM-7886. The main downside I see is that the API would be considerably uglier on the Java side. > Externally configured KafkaIO consumer causes coder problems > > > Key: BEAM-7870 > URL: https://issues.apache.org/jira/browse/BEAM-7870 > Project: Beam > Issue Type: Bug > Components: runner-flink, sdk-java-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > > There are limitations for the consumer to work correctly. The biggest issue > is the structure of KafkaIO itself, which uses a combination of the source > interface and DoFns to generate the desired output. The problem is that the > source interface is natively translated by the Flink Runner to support > unbounded sources in portability, while the DoFn runs in a Java environment. > To transfer data between the two a coder needs to be involved. It happens to > be that the initial read does not immediately drop the KafakRecord structure > which does not work together well with our current assumption of only > supporting "standard coders" present in all SDKs. Only the subsequent DoFn > converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn > won't have the coder available in its environment. > There are several possible solutions: > 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in > the Flink Runner > 2. Modify KafkaIO to immediately drop the KafkaRecord structure > 3. Add the KafkaRecordCoder to all SDKs > 4. Add a generic coder, e.g. AvroCoder to all SDKs > For a workaround which uses (3), please see this patch which is not a proper > fix but adds KafkaRecordCoder to the SDK such that it can be used > encode/decode records: > [https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed] > > See also > [https://github.com/apache/beam/pull/8251|https://github.com/apache/beam/pull/8251:] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (BEAM-7870) Externally configured KafkaIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911560#comment-16911560 ] Chad Dombrova edited comment on BEAM-7870 at 8/20/19 10:04 PM: --- To follow up from the ML thread, I'm having a very similar problem when converting PubsubIO to an external transform. {quote} To transfer data between the two a coder needs to be involved. It happens to be that the initial read does not immediately drop the KafakRecord structure which does not work together well with our current assumption of only supporting "standard coders" present in all SDKs. Only the subsequent DoFn converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn won't have the coder available in its environment. {quote} PubsubIO behaves much the same way as you've described -- a Read produces a PubsubMessage structure which is converted to bytes (in protobufs format) by a subsequent DoFn -- but the problem I'm seeing is not that "the DoFn won't have the coder available in its environment", it's that the Read does not use the proper coder because FlinkStreamingPortablePipelineTranslator.translateRead replaces all non-standard coders with ByteArrayCoder. So in my tests, the job fails with the first read message, complaining that PubsubMessage cannot be cast to bytes. Some thoughts on the proposed solutions: {quote} 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in the Flink Runner {quote} I don't see how this will help PubusbIO, because as I stated I'm fairly confident the error happens when flink tries to store the pubsub message immediately after it's been read, before the DoFn is even called. {quote} 2. Modify KafkaIO to immediately drop the KafkaRecord structure {quote} This is the simplest solution if we look at it superficially, but it's severely limited in usefulness. In PubsubIO, the next transform is actually a DoFn that wants a PubsubMessage, to update some stats metrics (see my overview of the pipeline below). This solution would either prevent that from working or require that every transform decode and encode the payload itself, which I think is self-defeating. {quote} 3. Add the KafkaRecordCoder to all SDKs {quote} Thanks for the patch example. I'll see if something similar works for PubsubIO. {quote} 4. Add a generic coder, e.g. AvroCoder to all SDKs {quote} What about protobufs as a compromise between options 3 and 4? In the case of PubsubIO and PubsubMessages, java has its own native coder (actually a pair, one for messages w/ attributes and one for w/out), while python is using protobufs. It would be simpler to write a protobufs coder for Java than to write a python version of PubsubMessageWithAttributesCoder, so I would favor protobufs as a pattern to apply generally to this kind of portability problem. The other thing I like about a protobufs-based approach is that you could write an abstract ProtoCoder base class which can easily be specialized by simply providing the native class and the proto class. My current external pubsub test pipeline looks like this after its been expanded: {noformat} --- java --- Read ParDo # stats ParDo # convert to pusbub protobuf byte array --- python -- ParDo # convert from pusbub protobuf byte array ParDo # custom logger {noformat} If we created and registered a protobufs-based coder for PubsubMessage, we could cut out the converter transforms, and it would look like this: {noformat} --- java --- Read ParDo # stats --- python -- ParDo # custom logger {noformat} That's appealing to me, though it's worth noting that we would get this benefit using any cross-language coder, whether it's protobufs, avro, or just rewriting PubsubMessageWithAttributesCoder in python. was (Author: chadrik): To follow up from the ML thread, I'm having a very similar problem when converting PubsubIO to an external transform. {quote} To transfer data between the two a coder needs to be involved. It happens to be that the initial read does not immediately drop the KafakRecord structure which does not work together well with our current assumption of only supporting "standard coders" present in all SDKs. Only the subsequent DoFn converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn won't have the coder available in its environment. {quote} PubsubIO behaves much the same way as you've described -- a Read produces a PubsubMessage structure which is converted to bytes (in protobufs format) by a subsequent DoFn -- but the problem I'm seeing is not that "the DoFn won't have the coder available in its environment", it's that the Read does not use the proper coder because FlinkStreamingPortablePipelineTranslator.translateRead replaces all non-standard coders with ByteArrayCoder. So in my tests, the job fails with the first read message, complaining
[jira] [Comment Edited] (BEAM-7870) Externally configured KafkaIO consumer causes coder problems
[ https://issues.apache.org/jira/browse/BEAM-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911560#comment-16911560 ] Chad Dombrova edited comment on BEAM-7870 at 8/20/19 7:01 PM: -- To follow up from the ML thread, I'm having a very similar problem when converting PubsubIO to an external transform. {quote} To transfer data between the two a coder needs to be involved. It happens to be that the initial read does not immediately drop the KafakRecord structure which does not work together well with our current assumption of only supporting "standard coders" present in all SDKs. Only the subsequent DoFn converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn won't have the coder available in its environment. {quote} PubsubIO behaves much the same way as you've described -- a Read produces a PubsubMessage structure which is converted to bytes (in protobufs format) by a subsequent DoFn -- but the problem I'm seeing is not that "the DoFn won't have the coder available in its environment", it's that the Read does not use the proper coder because FlinkStreamingPortablePipelineTranslator.translateRead replaces all non-standard coders with ByteArrayCoder. So in my tests, the job fails with the first read message, complaining that PubsubMessage cannot be cast to bytes. Some thoughts on the proposed solutions: {quote} 1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in the Flink Runner {quote} I don't see how this will help PubusbIO, because as I stated I'm fairly confident the error happens when flink tries to store the read pubsub message, before the DoFn is even called. {quote} 2. Modify KafkaIO to immediately drop the KafkaRecord structure {quote} This is the simplest solution if we look at it superficially, but it's severely limited in usefulness. In PubsubIO, the next transform is actually a DoFn that wants a PubsubMessage, to update some stats metrics (see my overview of the pipeline below). This solution would either prevent that from working or require that every transform decode and encode the payload itself, which I think is self-defeating. {quote} 3. Add the KafkaRecordCoder to all SDKs {quote} Thanks for the patch example. I'll see if something similar works for PubsubIO. {quote} 4. Add a generic coder, e.g. AvroCoder to all SDKs {quote} What about protobufs as a compromise between options 3 and 4? In the case of PubsubIO and PubsubMessages, java has its own native coder (actually a pair, one for messages w/ attributes and one for w/out), while python is using protobufs. It would be simpler to write a protobufs coder for Java than to write a python version of PubsubMessageWithAttributesCoder, so I would favor protobufs as a pattern to apply generally to this kind of portability problem. The other thing I like about a protobufs-based approach is that you could write an abstract ProtoCoder base class which can easily be specialized by simply providing the native class and the proto class. My current external pubsub test pipeline looks like this after its been expanded: {noformat} --- java --- Read ParDo # stats ParDo # convert to pusbub protobuf byte array --- python -- ParDo # convert from pusbub protobuf byte array ParDo # custom logger {noformat} If we created and registered a protobufs-based coder for PubsubMessage, we could cut out the converter transforms, and it would look like this: {noformat} --- java --- Read ParDo # stats --- python -- ParDo # custom logger {noformat} That's appealing to me, though it's worth noting that we would get this benefit using any cross-language coder, whether it's protobufs, avro, or just rewriting PubsubMessageWithAttributesCoder in python. was (Author: chadrik): To follow up from the ML thread, I'm having a very similar problem when converting PubsubIO to an external transform. {quote} To transfer data between the two a coder needs to be involved. It happens to be that the initial read does not immediately drop the KafakRecord structure which does not work together well with our current assumption of only supporting "standard coders" present in all SDKs. Only the subsequent DoFn converts the KafkaRecord structure into a raw KV[byte, byte], but the DoFn won't have the coder available in its environment. {quote} PubsubIO behaves much the same way as you've described -- a Read produces a PubsubMessage structure which is converted to bytes (in protobufs format) by a subsequent DoFn -- but the problem I'm seeing is not that "the DoFn won't have the coder available in its environment", it's that the Read does not use the proper coder because FlinkStreamingPortablePipelineTranslator.translateRead replaces all non-standard coders with ByteArrayCoder. So in my tests, the job fails with the first read message, complaining that PubsubMessage cannot be
[jira] [Created] (BEAM-8000) Add Delete method to gRPC JobService
Chad Dombrova created BEAM-8000: --- Summary: Add Delete method to gRPC JobService Key: BEAM-8000 URL: https://issues.apache.org/jira/browse/BEAM-8000 Project: Beam Issue Type: Improvement Components: beam-model Reporter: Chad Dombrova Assignee: Chad Dombrova As a user of the InMemoryJobService, I want a method to purge jobs from memory when they are no longer needed, so that the service does not balloon in memory usage over time. I was planning to name this Delete. Also considering the name Purge. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7720) Fix the exception type of InMemoryJobService when job id not found
[ https://issues.apache.org/jira/browse/BEAM-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova reassigned BEAM-7720: --- Assignee: Chad Dombrova > Fix the exception type of InMemoryJobService when job id not found > -- > > Key: BEAM-7720 > URL: https://issues.apache.org/jira/browse/BEAM-7720 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The contract in beam_job_api.proto for `CancelJobRequest`, > `GetJobStateRequest`, and `GetJobPipelineRequest` states: > > {noformat} > // Throws error NOT_FOUND if the jobId is not found{noformat} > > However, `InMemoryJobService` is handling this exception incorrectly by > rethrowing `NOT_FOUND` exceptions as `INTERNAL`. > neither `JobMessagesRequest` nor `GetJobMetricsRequest` state their contract > wrt exceptions, but they should probably be updated to handle `NOT_FOUND` in > the same way. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
Chad Dombrova created BEAM-7984: --- Summary: [python] The coder returned for typehints.List should be IterableCoder Key: BEAM-7984 URL: https://issues.apache.org/jira/browse/BEAM-7984 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Chad Dombrova Assignee: Chad Dombrova IterableCoder encodes a list and decodes to list, but typecoders.registry.get_coder(typehints.List[bytes]) returns a FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7931) Fix gradlew to allow offline builds
Chad Dombrova created BEAM-7931: --- Summary: Fix gradlew to allow offline builds Key: BEAM-7931 URL: https://issues.apache.org/jira/browse/BEAM-7931 Project: Beam Issue Type: Improvement Components: build-system Reporter: Chad Dombrova When running `./gradlew` the first thing it tries to do is download gradle: {noformat} Downloading https://services.gradle.org/distributions/gradle-5.2.1-all.zip {noformat} There seems to be no way to skip this, even if the correct version of gradle has already been downloaded and exists in the correct place. The tool should be smart enough to do a version check and skip downloading if it exists. Unfortunately, the logic is wrapped up inside gradle/wrapper/gradle-wrapper.jar so there does not seem to be any way to fix this. Where is the code for this jar? This is the first step of several to allow beam to be built offline. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7927) Add ability to get the list of submitted jobs from gRPC JobService
Chad Dombrova created BEAM-7927: --- Summary: Add ability to get the list of submitted jobs from gRPC JobService Key: BEAM-7927 URL: https://issues.apache.org/jira/browse/BEAM-7927 Project: Beam Issue Type: New Feature Components: beam-model Reporter: Chad Dombrova As a developer building a client for monitoring running jobs via the JobService, I want the ability to get a list of jobs – particularly their job ids – so that I can use this as an entry point for getting other information about a job already offered by the JobService, such as the pipeline definition, a stream of status changes, etc. Currently, the JobService is only useful if you already have a valid job id. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs
[ https://issues.apache.org/jira/browse/BEAM-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova reassigned BEAM-7738: --- Assignee: Chad Dombrova > Support PubSubIO to be configured externally for use with other SDKs > > > Key: BEAM-7738 > URL: https://issues.apache.org/jira/browse/BEAM-7738 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp, runner-flink, sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Labels: portability > > Now that KafkaIO is supported via the external transform API (BEAM-7029) we > should add support for PubSub. -- This message was sent by Atlassian JIRA (v7.6.14#76016)