[jira] [Created] (FLINK-32939) pyflink 1.17.0 has missing transitive dependency for pyopenssl
Nathanael England created FLINK-32939: - Summary: pyflink 1.17.0 has missing transitive dependency for pyopenssl Key: FLINK-32939 URL: https://issues.apache.org/jira/browse/FLINK-32939 Project: Flink Issue Type: Bug Environment: Ubuntu 20.04 Flink 1.17.0 Reporter: Nathanael England When running a pyflink job recently, we got an error about not being able to import something from pyopenssl correctly. Here's the traceback. {code:bash} E Caused by: java.lang.RuntimeException: Failed to create stage bundle factory! Traceback (most recent call last): E File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main E return _run_code(code, main_globals, None, E File "/usr/lib/python3.8/runpy.py", line 87, in _run_code E exec(code, run_globals) E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_boot.py", line 36, in E from apache_beam.portability.api.org.apache.beam.model.fn_execution.v1.beam_fn_api_pb2 import \ E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/apache_beam/__init__.py", line 93, in E from apache_beam import io E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/apache_beam/io/__init__.py", line 27, in E from apache_beam.io.mongodbio import * E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/apache_beam/io/mongodbio.py", line 93, in E from bson import json_util E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/bson/json_util.py", line 130, in E from pymongo.errors import ConfigurationError E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/__init__.py", line 114, in E from pymongo.collection import ReturnDocument E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/collection.py", line 26, in E from pymongo import common, helpers, message E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/common.py", line 38, in E from pymongo.ssl_support import validate_allow_invalid_certs, validate_cert_reqs E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/ssl_support.py", line 27, in E import pymongo.pyopenssl_context as _ssl E File "/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/pyopenssl_context.py", line 27, in E from OpenSSL import SSL as _SSL E File "/usr/local/lib/python3.8/dist-packages/OpenSSL/__init__.py", line 8, in E from OpenSSL import crypto, SSL E File "/usr/local/lib/python3.8/dist-packages/OpenSSL/crypto.py", line 1556, in E class X509StoreFlags(object): E File "/usr/local/lib/python3.8/dist-packages/OpenSSL/crypto.py", line 1577, in X509StoreFlags E CB_ISSUER_CHECK = _lib.X509_V_FLAG_CB_ISSUER_CHECK E AttributeError: module 'lib' has no attribute 'X509_V_FLAG_CB_ISSUER_CHECK' {code} It seems to be the case from this traceback that apache-flink depends on apache-beam which depends on pymongo which wants to depend on pyopenssl. In order to do that within the pymongo library, users need to specify `pymongo[ocsp]` as their dependency instead of just `pymongo`. It looks like apache-beam is just specifying `pymongo` and then doing some horrible python path mutilation to find some random installation on the system path. The tool we are using (pantsbuild) modifies python path at the start, so it shouldn't have been possible to find this installation. I believe this is an Apache Beam problem, but Jira will not let me make an issue there. Since this affects all Flink python users, though, it seems appropriate to be here as whatever fix comes to Beam should be worked downstream into Flink. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-31069) Pyflink 1.16.1 has unclosed resources at the end of unit tests
[ https://issues.apache.org/jira/browse/FLINK-31069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688831#comment-17688831 ] Nathanael England commented on FLINK-31069: --- Once you get around/ignore this, a new class of issues has to be solved to get the base tooling to work. I've opened FLINK-31073 to track my findings there. > Pyflink 1.16.1 has unclosed resources at the end of unit tests > -- > > Key: FLINK-31069 > URL: https://issues.apache.org/jira/browse/FLINK-31069 > Project: Flink > Issue Type: Bug > Environment: Ubuntu 20.04 > Python 3.8.10 > Pyflink 1.16.1 >Reporter: Nathanael England >Priority: Minor > > A simple pyflink unit test has unclosed resources at the end of the testing. > A minimally reproducable example of this can be seen with the following in > 1.16.1 > {code:python} > from pyflink.testing import test_case_utils > class InputBroadcastProcessFunctionTests( > test_case_utils.PyFlinkStreamingTestCase): > def test_nothing(self): > pass > {code} > When `pytest.ini`is instructed to have `filterwarnings = errors`, the user is > met with errors like the following > {code:bash} > $ pytest example_test.py > > test session starts > > platform linux -- Python 3.8.10, pytest-7.2.1, pluggy-1.0.0 > rootdir: /home/my_repo, configfile: pytest.ini > plugins: forked-1.6.0, anyio-3.6.2, timeout-2.1.0, typeguard-2.13.3, > xdist-2.5.0, rabbitmq-2.2.1, cov-4.0.0 > timeout: 60.0s > timeout method: signal > timeout func_only: False > collected 1 item > > > > example_test.py E > > > [100%] > == > ERRORS > === > _ > ERROR at setup of InputBroadcastProcessFunctionTests.test_nothing > _ > cls = , func = call_runtest_hook.. at 0x7f42530f0af0>, when = 'setup', > reraise = (, ) > @classmethod > def from_call( > cls, > func: "Callable[[], TResult]", > when: "Literal['collect', 'setup', 'call', 'teardown']", > reraise: Optional[ > Union[Type[BaseException], Tuple[Type[BaseException], ...]] > ] = None, > ) -> "CallInfo[TResult]": > """Call func, wrapping the result in a CallInfo. > > :param func: > The function to call. Called without arguments. > :param when: > The phase in which the function is called. > :param reraise: > Exception or exceptions that shall propagate if raised by the > function, instead of being wrapped in the CallInfo. > """ > excinfo = None > start = timing.time() > precise_start = timing.perf_counter() > try: > > result: Optional[TResult] = func() > dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:339: > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:260: > in > lambda: ihook(item=item, **kwds), when=when, reraise=reraise > dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/pluggy/_hooks.py:265: > in __call__ > return self._hookexec(self.name, self.get_hookimpls(),
[jira] [Created] (FLINK-31073) Pyflink testing library can't be used out of the box
Nathanael England created FLINK-31073: - Summary: Pyflink testing library can't be used out of the box Key: FLINK-31073 URL: https://issues.apache.org/jira/browse/FLINK-31073 Project: Flink Issue Type: Bug Environment: Ubuntu 20.04 Python 3.8.10 Pyflink 1.16.1 Reporter: Nathanael England The pyflink distribution comes with a `pyflink.testing.test_case_utils.py` that makes it appear like unit testing pyflink tooling is supported. It actually takes some non-trivial effort to figure out which packages are needed in order to run a simple no-op test case that makes it through the class setup in that module. The user has to add the following jar files to their system in order to get through the set up steps. {code:bash} flink-runtime-1.16.1-tests.jar flink-test-utils-1.16.1.jar hamcrest-core-1.3.jar junit-4.13.2.jar {code} The first is needed because the gathering of `MiniClusterResourceConfiguration` fails to be retrieved. The second is needed because it provides `MiniClusterWithClientResource`. The junit jars are needed because they are a dependency of `MiniClusterWithClientResource` and the user is met with a `ClassNotFoundError` for `org.junit.rules.ExternalResource` when trying to set up the mini cluster resource. Further, these jars have to be put in a place where `pyflink_gateway_server.py:construct_test_classpath` is set up to look. It has some patterns that are expected under the source root of the installation. For pyflink, this is typically inside a virtual environment folder that a user should not be modifying. The only alternative to not putting the files inside the virtual environment directories is to override that function with a custom function that looks for jar files to add somewhere else. The documentation available has no mention of python unit testing examples. Most of the motivation for this fix came from https://github.com/dianfu/pyflink-faq/tree/main/testing . -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31069) Pyflink 1.16.1 has unclosed resources at the end of unit tests
Nathanael England created FLINK-31069: - Summary: Pyflink 1.16.1 has unclosed resources at the end of unit tests Key: FLINK-31069 URL: https://issues.apache.org/jira/browse/FLINK-31069 Project: Flink Issue Type: Bug Environment: Ubuntu 20.04 Python 3.8.10 Pyflink 1.16.1 Reporter: Nathanael England A simple pyflink unit test has unclosed resources at the end of the testing. A minimally reproducable example of this can be seen with the following in 1.16.1 {code:python} from pyflink.testing import test_case_utils class InputBroadcastProcessFunctionTests( test_case_utils.PyFlinkStreamingTestCase): def test_nothing(self): pass {code} When `pytest.ini`is instructed to have `filterwarnings = errors`, the user is met with errors like the following {code:bash} $ pytest example_test.py test session starts platform linux -- Python 3.8.10, pytest-7.2.1, pluggy-1.0.0 rootdir: /home/my_repo, configfile: pytest.ini plugins: forked-1.6.0, anyio-3.6.2, timeout-2.1.0, typeguard-2.13.3, xdist-2.5.0, rabbitmq-2.2.1, cov-4.0.0 timeout: 60.0s timeout method: signal timeout func_only: False collected 1 item example_test.py E [100%] == ERRORS === _ ERROR at setup of InputBroadcastProcessFunctionTests.test_nothing _ cls = , func = . at 0x7f42530f0af0>, when = 'setup', reraise = (, ) @classmethod def from_call( cls, func: "Callable[[], TResult]", when: "Literal['collect', 'setup', 'call', 'teardown']", reraise: Optional[ Union[Type[BaseException], Tuple[Type[BaseException], ...]] ] = None, ) -> "CallInfo[TResult]": """Call func, wrapping the result in a CallInfo. :param func: The function to call. Called without arguments. :param when: The phase in which the function is called. :param reraise: Exception or exceptions that shall propagate if raised by the function, instead of being wrapped in the CallInfo. """ excinfo = None start = timing.time() precise_start = timing.perf_counter() try: > result: Optional[TResult] = func() dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:339: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:260: in lambda: ihook(item=item, **kwds), when=when, reraise=reraise dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/pluggy/_hooks.py:265: in __call__ return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult) dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/pluggy/_manager.py:80: in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/unraisableexception.py:83: in pytest_runtest_setup yield from unraisable_exception_runtest_hook() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[jira] [Commented] (FLINK-29796) pyflink protobuf requirement out of date
[ https://issues.apache.org/jira/browse/FLINK-29796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648177#comment-17648177 ] Nathanael England commented on FLINK-29796: --- Just wanted to bump this. Looking at [https://github.com/apache/flink/blob/release-1.16/flink-python/setup.py#L314,] it seems this has already made its way back to 1.16 if I'm understanding this correctly? This is blocking me from pulling apache-flink in through our requirements.txt since we require protobuf > 3.19 due to the security vulnerabilities detailed [here|https://github.com/protocolbuffers/protobuf/security/advisories/GHSA-8gq9-2x98-w8hf]. We use [pantsbuild|https://www.pantsbuild.org/] for python repo management so there's no easy way to separate out our requirements for a temporary solution. > pyflink protobuf requirement out of date > > > Key: FLINK-29796 > URL: https://issues.apache.org/jira/browse/FLINK-29796 > Project: Flink > Issue Type: Bug > Components: API / Python >Affects Versions: 1.16.0 >Reporter: Jorge Villatoro >Priority: Major > > The setup.py file for pyflink currently requires protobuf<3.18 but the > dev-requirements.txt file lists protubuf<=3.21 which seems to indicate that > the library works with newer version of protobuf. The latest version of > protobuf which satisfies the requirement was 3.17.3 which was released over a > year ago, and notably the various gcloud api packages all require much newer > versions (3.19+ I think). Obviously there are ways around this but the right > answer is likely to ease/change the requirement. -- This message was sent by Atlassian Jira (v8.20.10#820010)