Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Jörn Franke
Would it maybe make sense to provide Flink as an engine on Hive („flink-on-Hive“)? Eg to address 4,5,6,8,9,10. this could be more loosely coupled than integrating hive in all possible flink core modules and thus introducing a very tight dependency to Hive in the core. 1,2,3 could be achieved

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Zhang, Xuefu
Hi Fabian/Vno, Thank you very much for your encouragement inquiry. Sorry that I didn't see Fabian's email until I read Vino's response just now. (Somehow Fabian's went to the spam folder.) My proposal contains long-term and short-terms goals. Nevertheless, the effort will focus on the

[jira] [Created] (FLINK-10527) Cleanup constant isNewMode in YarnTestBase

2018-10-10 Thread vinoyang (JIRA)
vinoyang created FLINK-10527: Summary: Cleanup constant isNewMode in YarnTestBase Key: FLINK-10527 URL: https://issues.apache.org/jira/browse/FLINK-10527 Project: Flink Issue Type: Bug

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread vino yang
Hi Xuefu, Appreciate this proposal, and like Fabian, it would look better if you can give more details of the plan. Thanks, vino. Fabian Hueske 于2018年10月10日周三 下午5:27写道: > Hi Xuefu, > > Welcome to the Flink community and thanks for starting this discussion! > Better Hive integration would be

[jira] [Created] (FLINK-10526) Hadoop FileSystem not initialized properly on Yarn

2018-10-10 Thread Yan Yan (JIRA)
Yan Yan created FLINK-10526: --- Summary: Hadoop FileSystem not initialized properly on Yarn Key: FLINK-10526 URL: https://issues.apache.org/jira/browse/FLINK-10526 Project: Flink Issue Type: Bug

Re: Sharing state between subtasks

2018-10-10 Thread Elias Levy
On Wed, Oct 10, 2018 at 9:33 AM Fabian Hueske wrote: > I think the new source interface would be designed to be able to leverage > shared state to achieve time alignment. > I don't think this would be possible without some kind of shared state. > > The problem of tasks that are far ahead in time

Re: Handling burst I/O when using tumbling/sliding windows

2018-10-10 Thread Rong Rong
Hi Piotrek, Thanks for the feedback and reviews. Yes, as I explained previously in reply to the (2B) point. I think it is possible to create our own customized window assigner without any API change if we eliminate the requirement of *"the same key should always results in the same offset"* I

Re: Sharing state between subtasks

2018-10-10 Thread Thomas Weise
Thanks for the feedback and comments so far. I want to elaborate more on the need for the shared state and awareness of watermark alignment in the source implementation. Sources like Kafka and Kinesis pull from the external system and then emit the records. For Kinesis, we have multiple consumer

Re: Sharing state between subtasks

2018-10-10 Thread Fabian Hueske
I think the new source interface would be designed to be able to leverage shared state to achieve time alignment. I don't think this would be possible without some kind of shared state. The problem of tasks that are far ahead in time cannot be solved with back-pressure. That's because a task

Re: Sharing state between subtasks

2018-10-10 Thread Elias Levy
On Wed, Oct 10, 2018 at 8:15 AM Aljoscha Krettek wrote: > I think the two things (shared state and new source interface) are > somewhat orthogonal. The new source interface itself alone doesn't solve > the problem, we would still need some mechanism for sharing the event-time > information

Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-10-10 Thread Fabian Hueske
Hi all, I opened a PR [1] to add the PR review guide to the Flink website. Cheers, Fabian [1] https://github.com/apache/flink-web/pull/126 Am Mi., 10. Okt. 2018 um 17:27 Uhr schrieb Aljoscha Krettek < aljos...@apache.org>: > +1 > > > On 9. Oct 2018, at 17:11, Hequn Cheng wrote: > > > > +1 >

Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-10-10 Thread Aljoscha Krettek
+1 > On 9. Oct 2018, at 17:11, Hequn Cheng wrote: > > +1 > > On Tue, Oct 9, 2018 at 3:25 PM Till Rohrmann wrote: > >> +1 >> >> On Tue, Oct 9, 2018 at 9:08 AM Zhijiang(wangzhijiang999) >> wrote: >> >>> +1 >>> -- >>> 发件人:vino

Re: [VOTE] Release flink-shaded 5.0, release candidate #1

2018-10-10 Thread Aljoscha Krettek
+1 I did - verify all changes between 4.0 and 5.0 - check signature and hash of the source release - build a work-in-progress branch for Scala 2.12 support using the new shaded asm6 package > On 10. Oct 2018, at 15:11, Chesnay Schepler wrote: > > Hi everyone, > Please review and vote on

Re: Sharing state between subtasks

2018-10-10 Thread Aljoscha Krettek
Sorry for also derailing this a bit earlier... I think the two things (shared state and new source interface) are somewhat orthogonal. The new source interface itself alone doesn't solve the problem, we would still need some mechanism for sharing the event-time information between different

[jira] [Created] (FLINK-10525) Deserialization schema, skip data, that couldn't be properly deserialized

2018-10-10 Thread Rinat Sharipov (JIRA)
Rinat Sharipov created FLINK-10525: -- Summary: Deserialization schema, skip data, that couldn't be properly deserialized Key: FLINK-10525 URL: https://issues.apache.org/jira/browse/FLINK-10525

Re: Sharing state between subtasks

2018-10-10 Thread Jamie Grier
Also, I'm afraid I derailed this thread just a bit.. So also back to Thomas's original question.. If we decide state-sharing across source subtasks is the way forward for now -- does anybody have thoughts to share on what form this should take? Thomas mentioned Akka or JGroups. Other thoughts?

Re: Sharing state between subtasks

2018-10-10 Thread Jamie Grier
Okay, so I think there is a lot of agreement here about (a) This is a real issue for people, and (b) an ideal long-term approach to solving it. As Aljoscha and Elias said a full solution to this would be to also redesign the source interface such that individual partitions are exposed in the API

[jira] [Created] (FLINK-10524) HeartbeatManagerTest failed on travis

2018-10-10 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10524: Summary: HeartbeatManagerTest failed on travis Key: FLINK-10524 URL: https://issues.apache.org/jira/browse/FLINK-10524 Project: Flink Issue Type:

[VOTE] Release flink-shaded 5.0, release candidate #1

2018-10-10 Thread Chesnay Schepler
Hi everyone, Please review and vote on the release candidate #1 for the version 5.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) This release * adds jackson-dataformat-csv to the the shaded-jackson module (used for CSV table

[jira] [Created] (FLINK-10523) Add jackson-dataformat-csv to flink-shaded

2018-10-10 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-10523: Summary: Add jackson-dataformat-csv to flink-shaded Key: FLINK-10523 URL: https://issues.apache.org/jira/browse/FLINK-10523 Project: Flink Issue

Re: [DISCUSS] Improvements to the Unified SQL Connector API

2018-10-10 Thread Timo Walther
Hi everyone, thanks for the feedback that we got so far. I will update the document in the next couple of hours such that we can continue with the discussion. Regarding the table type: Actually I just didn't mention it in the document, because the table type is a SQL Client/External catalog

[jira] [Created] (FLINK-10522) Check if RecoverableWriter supportsResume and accordingly.

2018-10-10 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-10522: -- Summary: Check if RecoverableWriter supportsResume and accordingly. Key: FLINK-10522 URL: https://issues.apache.org/jira/browse/FLINK-10522 Project: Flink

Re: [DISCUSS] Flink Cluster Overview Dashboard Improvement Proposal

2018-10-10 Thread Fabian Wollert
Hi everyone, thx for all the comments and feedback. Let me address everything individually: @Till: yes, for the start my plan would be to just touch the flink-runtime-web/web-dashboard repo/folder. @Jin Sun: - smaller icons on increasing server counts: yes, thats also something i already

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Fabian Hueske
Hi Xuefu, Welcome to the Flink community and thanks for starting this discussion! Better Hive integration would be really great! Can you go into details of what you are proposing? I can think of a couple ways to improve Flink in that regard: * Support for Hive UDFs * Support for Hive metadata

[jira] [Created] (FLINK-10521) TaskManager metrics are not reported to prometheus after running a job

2018-10-10 Thread Florian Schmidt (JIRA)
Florian Schmidt created FLINK-10521: --- Summary: TaskManager metrics are not reported to prometheus after running a job Key: FLINK-10521 URL: https://issues.apache.org/jira/browse/FLINK-10521

Re: [DISCUSS] Flink Cluster Overview Dashboard Improvement Proposal

2018-10-10 Thread Robert Metzger
Hey Fabian, thanks a lot for reaching out to the Flink community with this proposal! (Posting to the ML instead of creating a JIRA is a good idea for such questions -- you can create a ticket/tickets once the discussion here has come to a conclusion) I have two comments: - You are listing

回复:[DISCUSS] Flink Cluster Overview Dashboard Improvement Proposal

2018-10-10 Thread Zhijiang(wangzhijiang999)
Thanks Fabian for proposing this topic. It is very worth improving the web dashborad for showing more useful informations which can benefit flink users a lot. Just two small personal concerns: 1. The start time and end time are already given, so it is easy to estimate the rough duration time.