If point beam to the local jar, will beam start and also stop the expansion
service?

Thanks
     Mark

On Wed, 24 Jan 2024 at 08:30, Robert Bradshaw via user <[email protected]>
wrote:

> You can also manually designate a replacement jar to be used rather
> than fetching the jar from maven, either as a pipeline option or (as
> of the next release) as an environment variable. The format is a json
> mapping from gradle targets (which is how we identify these jars) to
> local files (or urls). For example, pass
>
>
> --beam_services='{":sdks:java:extensions:sql:expansion-service:shadowJar":
> "/path/to/your/copy.jar"}'
>
> to use the local jar to automatically expand your SQL transforms.
>
> See the docs at
>
> https://github.com/apache/beam/blob/7e95776a8d08ef738be49ef47842029c306f2bf5/sdks/python/apache_beam/options/pipeline_options.py#L587
>
> On Tue, Jan 23, 2024 at 5:59 PM Chamikara Jayalath via user
> <[email protected]> wrote:
> >
> > The expansion service jar is needed since sql.py includes cross-language
> transforms that use the Java implementation behind the hood.
> >
> > Once downloaded, the jar is cached, and subsequent jobs should use the
> jar from that location.
> >
> > If you want to use a locally available jar, you can manually startup an
> expansion service [1] and point the Python SQL transform to that [2].
> >
> > Thanks,
> > Cham
> >
> > [1]
> https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/#choose-an-expansion-service
> > [2]
> https://github.com/apache/beam/blob/7ff25d896250508570b27683bc76523ac2fe3210/sdks/python/apache_beam/transforms/sql.py#L84
> >
> > On Tue, Jan 23, 2024 at 3:57 PM Mark Striebeck <[email protected]>
> wrote:
> >>
> >> Hi,
> >>
> >> Sorry, this question seems so obvious that I'm sure it came up before.
> But I couldn't find anything in the docs or the mail archives. Feel free to
> point me in the right direction...
> >>
> >> We are using the Python API for Beam. Recently we started using Beam
> SQL - which apparently needs a jar file that is not provided with the
> Python Pip package. When I run tests,I can see that Beam downloads
> beam-sdks-java-extensions-sql-expansion-service-2.52.0.jar and unpacks it
> into ~/.apache_beam and uses it to start an RPC server.
> >>
> >> While this works for local testing, I am trying to figure out how to
> work this into our CI and deployment process.
> >>
> >> Preferably would be to download a pip package that has this jar (and
> others) in it and just uses it.
> >>
> >> If that doesn't exist (I couldn't find it), then we'd need to check
> this jar file into our source tree, so that we can use it for CI but then
> also make it part of the docker image that we use to run our Beam pipelines
> on GCP Dataflow. How could I tell Beam to use that file instead of
> downloading it? I tried obvious settings like CLASSPATH environment
> variable - but nothing works. Beam always tries to fetch the file from
> maven.
> >>
> >> Again, feel free to point me to any relevant mail discussion or web
> page.
> >>
> >> Thanks
> >>      Mark
>

Reply via email to