Hi Flink users,

Sharing something that solved a recurring problem for us in production,
in case others are hitting the same wall.

1. THE PROBLEM

In Flink 1.20, there is no supported way to deploy a .sql file containing both 
DDL and DML statements via "flink run" in Application or Per-Job mode. SQL 
Client's -f flag only works in Session mode; the Application mode equivalent 
(FLIP-480) is implemented in Flink 2.x but not backported.

For teams not ready to migrate to Flink 2.x which is a significant 
breaking-change upgrade. This means either writing Java wrappers around each 
SQL statement, or living with Session mode’s resource-sharing limitations.

2. WHAT WE BUILT

A small launcher JAR that fills this gap:
  
  $FLINK_HOME/bin/flink run \
    --target yarn-application \
    flink-sql-bootstrap.jar \
    --script-file hdfs://warehouse/jobs/dwd_orders.sql

A single command, with DDL + DML in one file, running in Application mode. It 
also supports:

- Catalog snapshots: pre-register tables, views, and UDFs from a JSON file so 
SQL scripts contain zero DDL
- Per-operator resource tuning: set parallelism, CPU, and memory per operator 
via a JSON config, filling the gap between Flink SQL and DataStream-level 
resource control
- Dry-run modes: --validate (syntax check, ~2s, no cluster needed) and 
--compile (outputs the optimized plan JSON), useful for CI/CD pipelines

Verified on Flink 1.20.4, 2.0.2, 2.1.1, and 2.2.0.

Repo: https://github.com/tonyabasy/flink-sql-bootstrap

I'm curious whether others have encountered the same deployment challenges, and 
what workarounds you've been using. Also happy to discuss if this approach 
could be useful in your setup.

Best,
Zhao Wang

Reply via email to