[jira] [Created] (FLINK-32939) pyflink 1.17.0 has missing transitive dependency for pyopenssl

2023-08-22 Thread Nathanael England (Jira)
Nathanael England created FLINK-32939:
-

 Summary: pyflink 1.17.0 has missing transitive dependency for 
pyopenssl
 Key: FLINK-32939
 URL: https://issues.apache.org/jira/browse/FLINK-32939
 Project: Flink
  Issue Type: Bug
 Environment: Ubuntu 20.04
Flink 1.17.0
Reporter: Nathanael England


When running a pyflink job recently, we got an error about not being able to 
import something from pyopenssl correctly. Here's the traceback.
{code:bash}
E   Caused by: java.lang.RuntimeException: Failed to create 
stage bundle factory! Traceback (most recent call last):
E File "/usr/lib/python3.8/runpy.py", line 194, in 
_run_module_as_main
E   return _run_code(code, main_globals, None,
E File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
E   exec(code, run_globals)
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pyflink/fn_execution/beam/beam_boot.py",
 line 36, in 
E   from 
apache_beam.portability.api.org.apache.beam.model.fn_execution.v1.beam_fn_api_pb2
 import \
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/apache_beam/__init__.py",
 line 93, in 
E   from apache_beam import io
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/apache_beam/io/__init__.py",
 line 27, in 
E   from apache_beam.io.mongodbio import *
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/apache_beam/io/mongodbio.py",
 line 93, in 
E   from bson import json_util
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/bson/json_util.py",
 line 130, in 
E   from pymongo.errors import ConfigurationError
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/__init__.py",
 line 114, in 
E   from pymongo.collection import ReturnDocument
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/collection.py",
 line 26, in 
E   from pymongo import common, helpers, message
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/common.py",
 line 38, in 
E   from pymongo.ssl_support import 
validate_allow_invalid_certs, validate_cert_reqs
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/ssl_support.py",
 line 27, in 
E   import pymongo.pyopenssl_context as _ssl
E File 
"/home/buildbot/.cache/pants/named_caches/pex_root/venvs/s/361094c4/venv/lib/python3.8/site-packages/pymongo/pyopenssl_context.py",
 line 27, in 
E   from OpenSSL import SSL as _SSL
E File 
"/usr/local/lib/python3.8/dist-packages/OpenSSL/__init__.py", line 8, in 

E   from OpenSSL import crypto, SSL
E File 
"/usr/local/lib/python3.8/dist-packages/OpenSSL/crypto.py", line 1556, in 

E   class X509StoreFlags(object):
E File 
"/usr/local/lib/python3.8/dist-packages/OpenSSL/crypto.py", line 1577, in 
X509StoreFlags
E   CB_ISSUER_CHECK = _lib.X509_V_FLAG_CB_ISSUER_CHECK
E   AttributeError: module 'lib' has no attribute 
'X509_V_FLAG_CB_ISSUER_CHECK'
{code}
It seems to be the case from this traceback that apache-flink depends on 
apache-beam which depends on pymongo which wants to depend on pyopenssl. In 
order to do that within the pymongo library, users need to specify 
`pymongo[ocsp]` as their dependency instead of just `pymongo`. It looks like 
apache-beam is just specifying `pymongo` and then doing some horrible python 
path mutilation to find some random installation on the system path. The tool 
we are using (pantsbuild) modifies python path at the start, so it shouldn't 
have been possible to find this installation.

I believe this is an Apache Beam problem, but Jira will not let me make an 
issue there. Since this affects all Flink python users, though, it seems 
appropriate to be here as whatever fix comes to Beam should be worked 
downstream into Flink.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31069) Pyflink 1.16.1 has unclosed resources at the end of unit tests

2023-02-14 Thread Nathanael England (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17688831#comment-17688831
 ] 

Nathanael England commented on FLINK-31069:
---

Once you get around/ignore this, a new class of issues has to be solved to get 
the base tooling to work. I've opened FLINK-31073 to track my findings there.

> Pyflink 1.16.1 has unclosed resources at the end of unit tests
> --
>
> Key: FLINK-31069
> URL: https://issues.apache.org/jira/browse/FLINK-31069
> Project: Flink
>  Issue Type: Bug
> Environment: Ubuntu 20.04
> Python 3.8.10
> Pyflink 1.16.1
>Reporter: Nathanael England
>Priority: Minor
>
> A simple pyflink unit test has unclosed resources at the end of the testing. 
> A minimally reproducable example of this can be seen with the following in 
> 1.16.1
> {code:python}
> from pyflink.testing import test_case_utils
> class InputBroadcastProcessFunctionTests(
> test_case_utils.PyFlinkStreamingTestCase):
> def test_nothing(self):
> pass
> {code}
> When `pytest.ini`is instructed to have `filterwarnings = errors`, the user is 
> met with errors like the following
> {code:bash}
> $ pytest example_test.py 
> 
>  test session starts 
> 
> platform linux -- Python 3.8.10, pytest-7.2.1, pluggy-1.0.0
> rootdir: /home/my_repo, configfile: pytest.ini
> plugins: forked-1.6.0, anyio-3.6.2, timeout-2.1.0, typeguard-2.13.3, 
> xdist-2.5.0, rabbitmq-2.2.1, cov-4.0.0
> timeout: 60.0s
> timeout method: signal
> timeout func_only: False
> collected 1 item  
>   
>   
>   
> example_test.py E 
>   
>   
> [100%]
> ==
>  ERRORS 
> ===
> _
>  ERROR at setup of InputBroadcastProcessFunctionTests.test_nothing 
> _
> cls = , func =  call_runtest_hook.. at 0x7f42530f0af0>, when = 'setup', 
> reraise = (, )
> @classmethod
> def from_call(
> cls,
> func: "Callable[[], TResult]",
> when: "Literal['collect', 'setup', 'call', 'teardown']",
> reraise: Optional[
> Union[Type[BaseException], Tuple[Type[BaseException], ...]]
> ] = None,
> ) -> "CallInfo[TResult]":
> """Call func, wrapping the result in a CallInfo.
> 
> :param func:
> The function to call. Called without arguments.
> :param when:
> The phase in which the function is called.
> :param reraise:
> Exception or exceptions that shall propagate if raised by the
> function, instead of being wrapped in the CallInfo.
> """
> excinfo = None
> start = timing.time()
> precise_start = timing.perf_counter()
> try:
> >   result: Optional[TResult] = func()
> dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:339:
>  
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:260:
>  in 
> lambda: ihook(item=item, **kwds), when=when, reraise=reraise
> dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/pluggy/_hooks.py:265:
>  in __call__
> return self._hookexec(self.name, self.get_hookimpls(), 

[jira] [Created] (FLINK-31073) Pyflink testing library can't be used out of the box

2023-02-14 Thread Nathanael England (Jira)
Nathanael England created FLINK-31073:
-

 Summary: Pyflink testing library can't be used out of the box
 Key: FLINK-31073
 URL: https://issues.apache.org/jira/browse/FLINK-31073
 Project: Flink
  Issue Type: Bug
 Environment: Ubuntu 20.04
Python 3.8.10
Pyflink 1.16.1
Reporter: Nathanael England


The pyflink distribution comes with a `pyflink.testing.test_case_utils.py` that 
makes it appear like unit testing pyflink tooling is supported. It actually 
takes some non-trivial effort to figure out which packages are needed in order 
to run a simple no-op test case that makes it through the class setup in that 
module.

The user has to add the following jar files to their system in order to get 
through the set up steps.
{code:bash}
flink-runtime-1.16.1-tests.jar
flink-test-utils-1.16.1.jar
hamcrest-core-1.3.jar
junit-4.13.2.jar
{code}
The first is needed because the gathering of `MiniClusterResourceConfiguration` 
fails to be retrieved. The second is needed because it provides 
`MiniClusterWithClientResource`. The junit jars are needed because they are a 
dependency of `MiniClusterWithClientResource` and the user is met with a 
`ClassNotFoundError` for `org.junit.rules.ExternalResource` when trying to set 
up the mini cluster resource.

Further, these jars have to be put in a place where 
`pyflink_gateway_server.py:construct_test_classpath` is set up to look. It has 
some patterns that are expected under the source root of the installation. For 
pyflink, this is typically inside a virtual environment folder that a user 
should not be modifying. The only alternative to not putting the files inside 
the virtual environment directories is to override that function with a custom 
function that looks for jar files to add somewhere else.

The documentation available has no mention of python unit testing examples. 
Most of the motivation for this fix came from 
https://github.com/dianfu/pyflink-faq/tree/main/testing .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31069) Pyflink 1.16.1 has unclosed resources at the end of unit tests

2023-02-14 Thread Nathanael England (Jira)
Nathanael England created FLINK-31069:
-

 Summary: Pyflink 1.16.1 has unclosed resources at the end of unit 
tests
 Key: FLINK-31069
 URL: https://issues.apache.org/jira/browse/FLINK-31069
 Project: Flink
  Issue Type: Bug
 Environment: Ubuntu 20.04
Python 3.8.10
Pyflink 1.16.1
Reporter: Nathanael England


A simple pyflink unit test has unclosed resources at the end of the testing. A 
minimally reproducable example of this can be seen with the following in 1.16.1

{code:python}
from pyflink.testing import test_case_utils


class InputBroadcastProcessFunctionTests(
test_case_utils.PyFlinkStreamingTestCase):
def test_nothing(self):
pass
{code}

When `pytest.ini`is instructed to have `filterwarnings = errors`, the user is 
met with errors like the following

{code:bash}
$ pytest example_test.py 

 test session starts 

platform linux -- Python 3.8.10, pytest-7.2.1, pluggy-1.0.0
rootdir: /home/my_repo, configfile: pytest.ini
plugins: forked-1.6.0, anyio-3.6.2, timeout-2.1.0, typeguard-2.13.3, 
xdist-2.5.0, rabbitmq-2.2.1, cov-4.0.0
timeout: 60.0s
timeout method: signal
timeout func_only: False
collected 1 item




example_test.py E   

  
[100%]

==
 ERRORS 
===
_
 ERROR at setup of InputBroadcastProcessFunctionTests.test_nothing 
_

cls = , func = . at 0x7f42530f0af0>, when = 'setup', reraise 
= (, )

@classmethod
def from_call(
cls,
func: "Callable[[], TResult]",
when: "Literal['collect', 'setup', 'call', 'teardown']",
reraise: Optional[
Union[Type[BaseException], Tuple[Type[BaseException], ...]]
] = None,
) -> "CallInfo[TResult]":
"""Call func, wrapping the result in a CallInfo.

:param func:
The function to call. Called without arguments.
:param when:
The phase in which the function is called.
:param reraise:
Exception or exceptions that shall propagate if raised by the
function, instead of being wrapped in the CallInfo.
"""
excinfo = None
start = timing.time()
precise_start = timing.perf_counter()
try:
>   result: Optional[TResult] = func()

dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:339:
 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/runner.py:260:
 in 
lambda: ihook(item=item, **kwds), when=when, reraise=reraise
dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/pluggy/_hooks.py:265:
 in __call__
return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/pluggy/_manager.py:80:
 in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
dist/export/python/virtualenvs/python-default/3.8.10/lib/python3.8/site-packages/_pytest/unraisableexception.py:83:
 in pytest_runtest_setup
yield from unraisable_exception_runtest_hook()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ 

[jira] [Commented] (FLINK-29796) pyflink protobuf requirement out of date

2022-12-15 Thread Nathanael England (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648177#comment-17648177
 ] 

Nathanael England commented on FLINK-29796:
---

Just wanted to bump this. Looking at 
[https://github.com/apache/flink/blob/release-1.16/flink-python/setup.py#L314,] 
it seems this has already made its way back to 1.16 if I'm understanding this 
correctly? This is blocking me from pulling apache-flink in through our 
requirements.txt since we require protobuf > 3.19 due to the security 
vulnerabilities detailed 
[here|https://github.com/protocolbuffers/protobuf/security/advisories/GHSA-8gq9-2x98-w8hf].
 We use [pantsbuild|https://www.pantsbuild.org/] for python repo management so 
there's no easy way to separate out our requirements for a temporary solution.

> pyflink protobuf requirement out of date
> 
>
> Key: FLINK-29796
> URL: https://issues.apache.org/jira/browse/FLINK-29796
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.16.0
>Reporter: Jorge Villatoro
>Priority: Major
>
> The setup.py file for pyflink currently requires protobuf<3.18 but the 
> dev-requirements.txt file lists protubuf<=3.21 which seems to indicate that 
> the library works with newer version of protobuf. The latest version of 
> protobuf which satisfies the requirement was 3.17.3 which was released over a 
> year ago, and notably the various gcloud api packages all require much newer 
> versions (3.19+ I think). Obviously there are ways around this but the right 
> answer is likely to ease/change the requirement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)