I'm using a standalone deployment on Kubernetes for this use case. Does the
archive get uploaded to the cluster via the :8081 REST/WebUI port or via
some other port like 6123/RPC or 6124/BLOB-SERVER? I'm wondering if not
exposing those ports on the local machine might prevent the archive from
getting loaded? Although I would have expected an explicit error in that
case.

NAMESPACE     NAME               TYPE           PORTS
flink         flink-jobmanager   ClusterIP      rpc:6123►0
blob-server:6124►0 webui:8081►0

Thanks,
Sumeet


On Fri, Jun 11, 2021 at 2:48 PM Roman Khachatryan <ro...@apache.org> wrote:

> Hi Sumeet,
>
> Probably there is an issue with uploading the archive while submitting the
> job.
> The commands and API usage look good to me.
> Dian could you please confirm that?
>
> Regards,
> Roman
>
> On Fri, Jun 11, 2021 at 9:04 AM Sumeet Malhotra
> <sumeet.malho...@gmail.com> wrote:
> >
> > Thank you Roman. Yes, that's what I am going to do.
> >
> > But I'm running into another issue... when I specify the --pyArchives
> option on the command line, the job never gets submitted and is stuck
> forever. And when I try to programmatically do this by calling
> add_python_archive(), the job gets submitted but fails because the target
> directory is not found on the UDF node. Flink is deployed on a K8S cluster
> in my case and the port 8081 is forwarded to the localhost.
> >
> > Here's the command line I use:
> >
> > ~/flink-1.13.0/bin/flink run --jobmanager localhost:8081  --python
> my_job.py  --pyArchives file:///path/to/schema.zip#schema
> >
> > And within the UDF I'm access the schema file as:
> >
> > read_schema('schema/my_schema.json')
> >
> > Or if I try using the API instead of the command-line, the call looks as:
> >
> > env = StreamExecutionEnvironment.get_execution_environment()
> > env.add_python_archive('schema.zip', 'schema')
> >
> > Initially, my_job.py itself had its own command line options, and I was
> thinking that might interfere with the overall Flink command line options,
> but even after removing that I'm not able to submit the job anymore.
> However, if I don't use the --pyArchives option and manually transfer the
> schema file to a location on the UDF node, the job gets submitted and works
> as expected.
> >
> > Any reason why this might happen?
> >
> > Thanks,
> > Sumeet
> >
> >
> > On Thu, Jun 10, 2021 at 11:09 PM Roman Khachatryan <ro...@apache.org>
> wrote:
> >>
> >> Hi,
> >>
> >> I think the second option is what you need. The documentation says
> >> only zip format is supported.
> >> Alternatively, you could upload the files to S3 or other DFS and
> >> access from TMs and re-upload when needed.
> >>
> >> [1]
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/python/dependency_management/#archives
> >>
> >> Regards,
> >> Roman
> >>
> >> On Wed, Jun 9, 2021 at 3:57 PM Sumeet Malhotra
> >> <sumeet.malho...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I'm using UDTFs in PyFlink, that depend upon a few resource files
> (JSON schema files actually). The path of this file can be passed into the
> UDTF, but essentially this path needs to exist on the Task Manager node
> where the task executes. What's the best way to upload these resource
> files? As of now, my custom Flink image creates a fixed path with the
> required resource files, but I'd like it to be run time configurable.
> >> >
> >> > There are 2 APIs available to load files when submitting a PyFlink
> job...
> >> >
> >> > stream_execution_environment.add_python_file() - Recommended to
> upload files (.py etc) but doesn't let me configure the final path on the
> target node. The files are added to PYTHONPATH, but it needs the UDTF
> function to lookup for this file. I'd like to pass the file location into
> the UDTF instead.
> >> >
> >> > stream_execution_environment.add_python_archive() - Appears to be
> more generic, in the sense that it allows a target directory to be
> specified. The documentation doesn't say anything about the contents of the
> archive, so I'm guessing it could be any type of file. Is this what is
> needed for my use case?
> >> >
> >> > Or is there any other recommended way to upload non-Python
> dependencies/resources?
> >> >
> >> > Thanks in advance,
> >> > Sumeet
>

Reply via email to