Hi Shuiyi,
Good idea. Actually the PDF was converted from a google doc. Here is its link:
https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
Once we reach an agreement, I can convert it to a FLIP.
Thanks,
Xuefu
Hi Xuefu,
Thanks a lot for driving this big effort. I would suggest convert your
proposal and design doc into a google doc, and share it on the dev mailing
list for the community to review and comment with title like "[DISCUSS] ...
Hive integration design ..." . Once approved, we can document it
Hi all,
To wrap up the discussion, I have attached a PDF describing the proposal, which
is also attached to FLINK-10556 [1]. Please feel free to watch that JIRA to
track the progress.
Please also let me know if you have additional comments or questions.
Thanks,
Xuefu
[1]
Hi Shuyi,
Thank you for your input. Yes, I agreed with a phased approach and like to move
forward fast. :) We did some work internally on DDL utilizing babel parser in
Calcite. While babel makes Calcite's grammar extensible, at first impression it
still seems too cumbersome for a project when
Hi Bowen,
Thank you for your feedback and interest in the project. Your contribution is
certainly welcome. Per your suggestion, I have created an Uber JIRA
(https://issues.apache.org/jira/browse/FLINK-10556) to track our overall effort
on this. For each subtask, we'd like to see a short
Welcome to the community and thanks for the great proposal, Xuefu! I think
the proposal can be divided into 2 stages: making Flink to support Hive
features, and make Hive to work with Flink. I agreed with Timo that on
starting with a smaller scope, so we can make progress faster. As for [6],
a
Thank you Xuefu, for bringing up this awesome, detailed proposal! It will
resolve lots of existing pain for users like me.
In general, I totally agree that improving FlinkSQL's completeness would be a
much better start point than building 'Hive on Flink', as the Hive community is
concerned
Thank you very nice , I fully agree with that.
> Am 11.10.2018 um 19:31 schrieb Zhang, Xuefu :
>
> Hi Jörn,
>
> Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact
> it is one of the two approaches that I named in the beginning of the thread.
> As also pointed out
Sounds smashing; I think the initial integration will help 60% or so flink
sql users and a lot other use cases will emerge when we solve the first one.
Thanks,
Taher Koitawala
On Fri 12 Oct, 2018, 10:13 AM Zhang, Xuefu, wrote:
> Hi Taher,
>
> Thank you for your input. I think you emphasized
Hi Taher,
Thank you for your input. I think you emphasized two important points:
1. Hive metastore could be used for storing Flink metadata
2. There are some usability issues around Flink SQL configuration
I think we all agree on #1. #2 may be well true and the usability should be
improved.
One other thought on the same lines was to use hive tables to store kafka
information to process streaming tables. Something like
"create table streaming_table (
bootstrapServers string,
topic string, keySerialiser string, ValueSerialiser string)"
Insert into streaming_table
I think integrating Flink with Hive would be an amazing option and also to
get Flink's SQL up to pace would be amazing.
Current Flink Sql syntax to prepare and process a table is too verbose,
users manually need to retype table definitions and that's a pain. Hive
metastore integration should be
Hi Rong,
Thanks for your feedback. Some of my earlier comments might have addressed some
of your points, so here I'd like to cover some specifics.
1. Yes, I expect that table stats stored in Hive will be used in Flink plan
optimization, but it's not part of compatibility concern (yet).
2. Both
Hi Timo,
Thank you for your input. It's exciting to see that the community has already
initiated some of the topics. We'd certainly like to leverage the current and
previous work and make progress in phases. Here I'd like to comment on a few
things on top of your feedback.
1. I think there
Hi Xuefu,
Thanks for putting together the overview. I would like to add some more on
top of Timo's comments.
1,2. I agree with Timo that a proper catalog support should also address
the metadata compatibility issues. I was actually wondering if you are
referring to something like utilizing table
Hi Jörn,
Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact it
is one of the two approaches that I named in the beginning of the thread. As
also pointed out there, this isn't mutually exclusive from work we proposed
inside Flink and they target at different user
Hi Xuefu,
thanks for your proposal, it is a nice summary. Here are my thoughts to
your list:
1. I think this is also on our current mid-term roadmap. Flink lacks a
poper catalog support for a very long time. Before we can connect
catalogs we need to define how to map all the information
Would it maybe make sense to provide Flink as an engine on Hive
(„flink-on-Hive“)? Eg to address 4,5,6,8,9,10. this could be more loosely
coupled than integrating hive in all possible flink core modules and thus
introducing a very tight dependency to Hive in the core.
1,2,3 could be achieved
Hi Fabian/Vno,
Thank you very much for your encouragement inquiry. Sorry that I didn't see
Fabian's email until I read Vino's response just now. (Somehow Fabian's went to
the spam folder.)
My proposal contains long-term and short-terms goals. Nevertheless, the effort
will focus on the
Hi Xuefu,
Appreciate this proposal, and like Fabian, it would look better if you can
give more details of the plan.
Thanks, vino.
Fabian Hueske 于2018年10月10日周三 下午5:27写道:
> Hi Xuefu,
>
> Welcome to the Flink community and thanks for starting this discussion!
> Better Hive integration would be
Hi Xuefu,
Welcome to the Flink community and thanks for starting this discussion!
Better Hive integration would be really great!
Can you go into details of what you are proposing? I can think of a couple
ways to improve Flink in that regard:
* Support for Hive UDFs
* Support for Hive metadata
21 matches
Mail list logo