Thanks, Paul.
Ferenc and I have been looking into unblocking the Kubernetes path via an
updated implementation for FLINK-28915 to ship the jars conveniently there.
You can expect an updated PR there next week. Looking forward to your
findings in the YARN POC.
On Mon, Dec 11, 2023 at 4:01 AM Paul
Hi Ferenc,
Sorry for my late reply.
> Is any active work happening on this FLIP? As far as I see there
> are blockers that needs to happen first to implement regarding
> artifact distribution.
You’re right. There’s a block in K8s application mode, but none in
YARN application. I’m doing a POC
Hello devs,
Is any active work happening on this FLIP? As far as I see there
are blockers that needs to happen first to implement regarding
artifact distribution.
Is this work in halt completetly or some efforts are going into
resolve the blockers first or something?
Our platform would benefit
Hi Jing,
Thanks for your input!
> Would you like to add
> one section to describe(better with script/code example) how to use it in
> these two scenarios from users' perspective?
OK. I’ll update the FLIP with the code snippet after I get the POC branch done.
> NIT: the pictures have
Hi Paul,
Thanks for driving it and thank you all for the informative discussion! The
FLIP is in good shape now. As described in the FLIP, SQL Driver will be
mainly used to run Flink SQLs in two scenarios: 1. SQL client/gateway in
application mode and 2. external system integration. Would you like
Hi Shengkai,
> * How can we ship the json plan to the JobManager?
The Flink K8s module should be responsible for file distribution. We could
introduce
an option like `kubernetes.storage.dir`. For each flink cluster, there would be
a
dedicated subdirectory, with the pattern like
Hi, Paul.
Thanks for your update. I have a few questions about the new design:
* How can we ship the json plan to the JobManager?
The current design only exposes an option about the URL of the json plan.
It seems the gateway is responsible to upload to an external stroage. Can
we reuse the
Hi Shengkai,
Sorry for my late reply. It took me some time to update the FLIP.
In the latest FLIP design, SQL Driver is placed in flink-sql-gateway module.
PTAL.
The FLIP does not cover details about the K8s file distribution, but its
general usage would
be very much the same as YARN setups.
Hi Yang,
Thanks a lot for your input!
It’s great that FLINK-28915 has covered the file download part. I’ve created
a ticket for the file upload part [1]. It's a prerequisite for supporting K8s
application
mode for SQL Gateway.
[1] https://issues.apache.org/jira/browse/FLINK-32315
Best,
Paul
> If it’s the case, I’m good with introducing a new module and making SQL
Driver
> an internal class and accepts JSON plans only.
I rethink this again and again. I think it's better to move the SqlDriver
into the sql-gateway module because the sql client relies on the
sql-gateway to submit the
Sorry for the late reply. I am in favor of introducing such a built-in
resource localization mechanism
based on Flink FileSystem. Then FLINK-28915[1] could be the second step
which will download
the jars and dependencies to the JobManager/TaskManager local directory
before working.
The first step
Hi Mason,
I get your point. I'm increasingly feeling the need to introduce a built-in
file distribution mechanism for flink-kubernetes module, just like Spark
does with `spark.kubernetes.file.upload.path` [1].
I’m assuming the workflow is as follows:
- KubernetesClusterDescripter uploads all
Hi ShengKai,
Good point with the ANALYZE TABLE and CALL PROCEDURE statements.
> Can we remove the jars if the job is running or gateway exits?
Yes, I think it would be okay to remove the resources after the job is
submitted.
It should be Gateway’s responsibility to remove them.
> Can we use
Hi Weihua,
Thanks a lot for your input!
I see the difference here is implementing the file distribution mechanism
in the generic CLI or in the SQL Driver. The CLI approach could benefit
non-pure-SQL applications (which is not covered by SQL Driver) as well.
Not sure if you’re proposing the
Hi Paul,
Thanks for your response!
I agree that utilizing SQL Drivers in Java applications is equally important
> as employing them in SQL Gateway. WRT init containers, I think most
> users use them just as a workaround. For example, wget a jar from the
> maven repo.
>
> We could implement the
Hi. Paul. Thanks for your update and the update makes me understand the
design much better.
But I still have some questions about the FLIP.
> For SQL Gateway, only DMLs need to be delegated to the SQL server
> Driver. I would think about the details and update the FLIP. Do you have
some
> ideas
Hi,
Thanks for updating the FLIP.
I have two cents on the distribution of SQLs and resources.
1. Should we support a common file distribution mechanism for k8s
application mode?
I have seen some issues and requirements on the mailing list.
In our production environment, we implement the
Hi Mason,
Thanks for your input!
> +1 for init containers or a more generalized way of obtaining arbitrary
> files. File fetching isn't specific to just SQL--it also matters for Java
> applications if the user doesn't want to rebuild a Flink image and just
> wants to modify the user application
Hi Paul,
+1 for this feature and supporting SQL file + JSON plans. We get a lot of
requests to just be able to submit a SQL file, but the JSON plan
optimizations make sense.
+1 for init containers or a more generalized way of obtaining arbitrary
files. File fetching isn't specific to just
Hi Jark,
Thanks for your input! Please see my comments inline.
> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> DataStream API also doesn't provide a default main class for users,
> why do we need to provide such one for SQL?
Sorry for the confusion I caused. By
Hi Paul,
Thanks for your reply. I left my comments inline.
> As the FLIP said, it’s good to have a default main class for Flink SQLs,
> which allows users to submit Flink SQLs in the same way as DataStream
> jobs, or else users need to write their own main class.
Isn't Table API the same way as
The FLIP is in the early phase and some details are not included, but
fortunately, we got lots of valuable ideas from the discussion.
Thanks to everyone who joined the dissuasion!
@Weihua @Shanmon @Shengkai @Biao @Jark
This weekend I’m gonna revisit and update the FLIP, adding more
details.
Hi Jark,
Thanks a lot for your input!
> If we decide to submit ExecNodeGraph instead of SQL file, is it still
> necessary to support SQL Driver?
I think so. Apart from usage in SQL Gateway, SQL Driver could simplify
Flink SQL execution with Flink CLI.
As the FLIP said, it’s good to have a
Hi Paul,
Thanks for starting this discussion. I like the proposal! This is a
frequently requested feature!
I agree with Shengkai that ExecNodeGraph as the submission object is a
better idea than SQL file. To be more specific, it should be JsonPlanGraph
or CompiledPlan which is the serializable
Hi Weihua,
You’re right. Distributing the SQLs to the TMs is one of the challenging
parts of this FLIP.
Web submission is not enabled in application mode currently as you said,
but it could be changed if we have good reasons.
What do you think about introducing a distributed storage for SQL
Thanks Paul for your reply.
SQLDriver looks good to me.
2. Do you mean a pass the SQL string a configuration or a program argument?
I brought this up because we were unable to pass the SQL file to Flink
using Kubernetes mode.
For DataStream/Python users, they need to prepare their images for
Hi Biao,
Thanks for your comments!
> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in
> Application mode? More specifically, if we use SQL client/gateway to
> execute some interactive SQLs like a SELECT query, can we ask flink to use
> Application mode to execute those
Hi Shengkai,
Thanks a lot for your comments! Please see my comments inline.
> 1. The FLIP does not specify the kind of SQL that will be submitted with
> the application mode. I believe only a portion of the SQL will be delegated
> to the SqlRunner.
You’re right. For SQL Gateway, only DMLs need
Sorry for the typo. I mean “We already have a PythonDriver doing the same job
for PyFlink."
Best,
Paul Lam
> 2023年5月31日 11:49,Paul Lam 写道:
>
> 1. I have a PythonDriver doing the same job for PyFlink [1]
Thanks Paul for the proposal!I believe it would be very useful for flink
users.
After reading the FLIP, I have some questions:
1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in
Application mode? More specifically, if we use SQL client/gateway to
execute some interactive
Hi Shammon,
Thanks a lot for your input!
I thought SQL Driver could act as a general-purpose default main class
for Flink SQL. It could be used in Flink CLI submission, web submission,
or SQL Client/Gateway submission. For SQL Client/Gateway submission,
we use it implicitly if needed, and for
Hi Weihua,
Thanks a lot for your input! Please see my comments inline.
> - Is SQLRunner the better name? We use this to run a SQL Job. (Not strong,
> the SQLDriver is fine for me)
I’ve thought about SQL Runner but picked SQL Driver for the following reasons
FYI:
1. I have a PythonDriver doing
Thanks for the proposal. The Application mode is very important to Flink
SQL. But I have some questions about the FLIP:
1. The FLIP does not specify the kind of SQL that will be submitted with
the application mode. I believe only a portion of the SQL will be delegated
to the SqlRunner.
2. Will
Thanks Paul for driving this proposal.
I found the sql driver has no config related options. If I understand
correctly, the sql driver can be used to submit sql jobs in a 'job
submission service' such as sql-gateway. In general, in addition to the
default config for Flink cluster which includes
Thanks Paul for the proposal.
+1 for this. It is valuable in improving ease of use.
I have a few questions.
- Is SQLRunner the better name? We use this to run a SQL Job. (Not strong,
the SQLDriver is fine for me)
- Could we run SQL jobs using SQL in strings? Otherwise, we need to prepare
a SQL
Hi team,
I’d like to start a discussion about FLIP-316 [1], which introduces a SQL
driver as the
default main class for Flink SQL jobs.
Currently, Flink SQL could be executed out of the box either via SQL
Client/Gateway
or embedded in a Flink Java/Python program.
However, each one has its
36 matches
Mail list logo