Hi Folks, I'm using a patched version of apache Beam 2.35 python and running on Flink on Kubernetes using the PortableJobRunner.
It looks like when submitting the job, the runner tries to upload a large 274 Mb flink job server jar to the staging service. This doesn't seem right. I already have an instance of the JobServer running and my program is talking to the job server so why is it trying to stage the JobServer? I believe this problem started when I upgraded to 2.35. When I debugged this I found the runner was getting stuck in offer_artifacts https://github.com/apache/beam/blob/38b21a71a76aad902bd903d525b25a5ff464df55/sdks/python/apache_beam/runners/portability/artifact_service.py#L235 When I looked at the rolePayload of the request in question it was rolePayload=beam-runners-flink-job-server-quH8FRP1-liJ7K9es-qBV9wguz4oBRlVegwqlW3OpqU.jar How does the runner decide which artifacts to upload? Could this be caused by using a patched version of the Python SDK but an unpatched version of the job server jar; so the python version (2.35.0.dev2) doesn't match the JobServer version 2.35.0? As a result, the portable runner thinks it needs to upload the Jar? We build our own version of the Python SDK because we need a fix for https://issues.apache.org/jira/browse/BEAM-12244. When we were using 2.33 we were also building our own Flink Jars in order to pull in Kafka patches which hadn't been released yet. Thanks J
