Re: [DISCUSS] Flink's supported APIs and Hive query syntax

Jing Zhang Mon, 07 Mar 2022 04:53:01 -0800

Hi Martijn,

Thanks for driving this discussion.


+1 on efforts on more hive syntax compatibility.

With the efforts on batch processing in recent versions(1.10~1.15), many
users have run batch processing jobs based on Flink.
In our team, we are trying to migrate most of the existing online batch
jobs from Hive/Spark to Flink. We hope this migration does not require
users to modify their sql.
Although Hive is not as popular as it used to be, Hive SQL is still alive
because many users still use Hive SQL to run spark jobs.
Therefore, compatibility with more HIVE syntax is critical to this
migration work.

Best,
Jing Zhang



Martijn Visser <martijnvis...@apache.org> 于2022年3月7日周一 19:23写道：

> Hi everyone,
>
> Flink currently has 4 APIs with multiple language support which can be used
> to develop applications:
>
> * DataStream API, both Java and Scala
> * Table API, both Java and Scala
> * Flink SQL, both in Flink query syntax and Hive query syntax (partially)
> * Python API
>
> Since FLIP-152 [1] the Flink SQL support has been extended to also support
> the Hive query syntax. There is now a follow-up FLINK-26360 [2] to address
> more syntax compatibility issues.
>
> I would like to open a discussion on Flink directly supporting the Hive
> query syntax. I have some concerns if having a 100% Hive query syntax is
> indeed something that we should aim for in Flink.
>
> I can understand that having Hive query syntax support in Flink could help
> users due to interoperability and being able to migrate. However:
>
> - Adding full Hive query syntax support will mean that we go from 6 fully
> supported API/language combinations to 7. I think we are currently already
> struggling with maintaining the existing combinations, let another one
> more.
> - Apache Hive is/appears to be a project that's not that actively developed
> anymore. The last release was made in January 2021. It's popularity is
> rapidly declining in Europe and the United State, also due Hadoop becoming
> less popular.
> - Related to the previous topic, other software like Snowflake,
> Trino/Presto, Databricks are becoming more and more popular. If we add full
> support for the Hive query syntax, then why not add support for Snowflake
> and the others?
> - We are supporting Hive versions that are no longer supported by the Hive
> community with known security vulnerabilities. This makes Flink also
> vulnerable for those type of vulnerabilities.
> - The currently Hive implementation is done by using a lot of internals of
> Flink, making Flink hard to maintain, with lots of tech debt and making
> things overly complex.
>
> From my perspective, I think it would be better to not have Hive query
> syntax compatibility directly in Flink itself. Of course we should have a
> proper Hive connector and a proper Hive catalog to make connectivity with
> Hive (the versions that are still supported by the Hive community) itself
> possible. Alternatively, if Hive query syntax is so important, it should
> not rely on internals but be available as a dialect/pluggable option. That
> could also open up the possibility to add more syntax support for others in
> the future, but I really think we should just focus on Flink SQL itself.
> That's already hard enough to maintain and improve on.
>
> I'm looking forward to the thoughts of both Developers and Users, so I'm
> cross-posting to both mailing lists.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=165227316
> [2] https://issues.apache.org/jira/browse/FLINK-21529
>

Re: [DISCUSS] Flink's supported APIs and Hive query syntax

Reply via email to