The expansion service jar is needed since sql.py includes
cross-language transforms that use the Java implementation behind the hood.

Once downloaded, the jar is cached, and subsequent jobs should use the jar
from that location.

If you want to use a locally available jar, you can manually startup an
expansion service [1] and point the Python SQL transform to that [2].

Thanks,
Cham

[1]
https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/#choose-an-expansion-service
[2]
https://github.com/apache/beam/blob/7ff25d896250508570b27683bc76523ac2fe3210/sdks/python/apache_beam/transforms/sql.py#L84

On Tue, Jan 23, 2024 at 3:57 PM Mark Striebeck <[email protected]>
wrote:

> Hi,
>
> Sorry, this question seems so obvious that I'm sure it came up before. But
> I couldn't find anything in the docs or the mail archives. Feel free to
> point me in the right direction...
>
> We are using the Python API for Beam. Recently we started using Beam SQL -
> which apparently needs a jar file that is not provided with the Python Pip
> package. When I run tests,I can see that Beam
> downloads beam-sdks-java-extensions-sql-expansion-service-2.52.0.jar and
> unpacks it into ~/.apache_beam and uses it to start an RPC server.
>
> While this works for local testing, I am trying to figure out how to work
> this into our CI and deployment process.
>
> Preferably would be to download a pip package that has this jar (and
> others) in it and just uses it.
>
> If that doesn't exist (I couldn't find it), then we'd need to check this
> jar file into our source tree, so that we can use it for CI but then also
> make it part of the docker image that we use to run our Beam pipelines on
> GCP Dataflow. How could I tell Beam to use that file instead of downloading
> it? I tried obvious settings like CLASSPATH environment variable - but
> nothing works. Beam always tries to fetch the file from maven.
>
> Again, feel free to point me to any relevant mail discussion or web page.
>
> Thanks
>      Mark
>

Reply via email to