Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Zhang, Xuefu
Hi Shuiyi, Good idea. Actually the PDF was converted from a google doc. Here is its link: https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing Once we reach an agreement, I can convert it to a FLIP. Thanks, Xuefu

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Shuyi Chen
Hi Xuefu, Thanks a lot for driving this big effort. I would suggest convert your proposal and design doc into a google doc, and share it on the dev mailing list for the community to review and comment with title like "[DISCUSS] ... Hive integration design ..." . Once approved, we can document it

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-24 Thread Zhang, Xuefu
Hi all, To wrap up the discussion, I have attached a PDF describing the proposal, which is also attached to FLINK-10556 [1]. Please feel free to watch that JIRA to track the progress. Please also let me know if you have additional comments or questions. Thanks, Xuefu [1]

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-15 Thread Zhang, Xuefu
Hi Shuyi, Thank you for your input. Yes, I agreed with a phased approach and like to move forward fast. :) We did some work internally on DDL utilizing babel parser in Calcite. While babel makes Calcite's grammar extensible, at first impression it still seems too cumbersome for a project when

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-15 Thread Zhang, Xuefu
Hi Bowen, Thank you for your feedback and interest in the project. Your contribution is certainly welcome. Per your suggestion, I have created an Uber JIRA (https://issues.apache.org/jira/browse/FLINK-10556) to track our overall effort on this. For each subtask, we'd like to see a short

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-13 Thread Shuyi Chen
Welcome to the community and thanks for the great proposal, Xuefu! I think the proposal can be divided into 2 stages: making Flink to support Hive features, and make Hive to work with Flink. I agreed with Timo that on starting with a smaller scope, so we can make progress faster. As for [6], a

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-13 Thread Bowen
Thank you Xuefu, for bringing up this awesome, detailed proposal! It will resolve lots of existing pain for users like me. In general, I totally agree that improving FlinkSQL's completeness would be a much better start point than building 'Hive on Flink', as the Hive community is concerned

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-12 Thread Jörn Franke
Thank you very nice , I fully agree with that. > Am 11.10.2018 um 19:31 schrieb Zhang, Xuefu : > > Hi Jörn, > > Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact > it is one of the two approaches that I named in the beginning of the thread. > As also pointed out

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-12 Thread Taher Koitawala
Sounds smashing; I think the initial integration will help 60% or so flink sql users and a lot other use cases will emerge when we solve the first one. Thanks, Taher Koitawala On Fri 12 Oct, 2018, 10:13 AM Zhang, Xuefu, wrote: > Hi Taher, > > Thank you for your input. I think you emphasized

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Taher, Thank you for your input. I think you emphasized two important points: 1. Hive metastore could be used for storing Flink metadata 2. There are some usability issues around Flink SQL configuration I think we all agree on #1. #2 may be well true and the usability should be improved.

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
One other thought on the same lines was to use hive tables to store kafka information to process streaming tables. Something like "create table streaming_table ( bootstrapServers string, topic string, keySerialiser string, ValueSerialiser string)" Insert into streaming_table

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
I think integrating Flink with Hive would be an amazing option and also to get Flink's SQL up to pace would be amazing. Current Flink Sql syntax to prepare and process a table is too verbose, users manually need to retype table definitions and that's a pain. Hive metastore integration should be

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Rong, Thanks for your feedback. Some of my earlier comments might have addressed some of your points, so here I'd like to cover some specifics. 1. Yes, I expect that table stats stored in Hive will be used in Flink plan optimization, but it's not part of compatibility concern (yet). 2. Both

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Timo, Thank you for your input. It's exciting to see that the community has already initiated some of the topics. We'd certainly like to leverage the current and previous work and make progress in phases. Here I'd like to comment on a few things on top of your feedback. 1. I think there

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Rong Rong
Hi Xuefu, Thanks for putting together the overview. I would like to add some more on top of Timo's comments. 1,2. I agree with Timo that a proper catalog support should also address the metadata compatibility issues. I was actually wondering if you are referring to something like utilizing table

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Jörn, Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact it is one of the two approaches that I named in the beginning of the thread. As also pointed out there, this isn't mutually exclusive from work we proposed inside Flink and they target at different user

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Timo Walther
Hi Xuefu, thanks for your proposal, it is a nice summary. Here are my thoughts to your list: 1. I think this is also on our current mid-term roadmap. Flink lacks a poper catalog support for a very long time. Before we can connect catalogs we need to define how to map all the information

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Jörn Franke
Would it maybe make sense to provide Flink as an engine on Hive („flink-on-Hive“)? Eg to address 4,5,6,8,9,10. this could be more loosely coupled than integrating hive in all possible flink core modules and thus introducing a very tight dependency to Hive in the core. 1,2,3 could be achieved

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Zhang, Xuefu
Hi Fabian/Vno, Thank you very much for your encouragement inquiry. Sorry that I didn't see Fabian's email until I read Vino's response just now. (Somehow Fabian's went to the spam folder.) My proposal contains long-term and short-terms goals. Nevertheless, the effort will focus on the

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread vino yang
Hi Xuefu, Appreciate this proposal, and like Fabian, it would look better if you can give more details of the plan. Thanks, vino. Fabian Hueske 于2018年10月10日周三 下午5:27写道: > Hi Xuefu, > > Welcome to the Flink community and thanks for starting this discussion! > Better Hive integration would be

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Fabian Hueske
Hi Xuefu, Welcome to the Flink community and thanks for starting this discussion! Better Hive integration would be really great! Can you go into details of what you are proposing? I can think of a couple ways to improve Flink in that regard: * Support for Hive UDFs * Support for Hive metadata