Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu
Hi Vino, Thanks a lot for your reply. > 1) When, Why and How to judge the memory is exhausted? My point here is that the local aggregate operator can buffer the inputs in memory and send out the results AT ANY TIME. i.e. element count or the time interval reached a pre-configured value, the

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Biao Liu
Hi Vino, +1 for this feature. It's useful for data skew. And it could also reduce shuffled datum. I have some concerns about the API part. From my side, this feature should be more like an improvement. I'm afraid the proposal is an overkill about the API part. Many other systems support

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2019-06-04 Thread Becket Qin
Hi Piotr, Thanks for the explanation. Please see some clarifications below. By time-based metric, I meant the portion of time spent on producing the record to downstream. For example, a source connector can report that it's spending 80% of time to emit record to downstream processing pipeline.

[jira] [Created] (FLINK-12734) remove getVolcanoPlanner method from FlinkOptimizeContext and RelNodeBlock does not depend on TableEnvironment

2019-06-04 Thread godfrey he (JIRA)
godfrey he created FLINK-12734: -- Summary: remove getVolcanoPlanner method from FlinkOptimizeContext and RelNodeBlock does not depend on TableEnvironment Key: FLINK-12734 URL:

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Tzu-Li (Gordon) Tai
On Wed, Jun 5, 2019 at 6:39 AM Xiaowei Jiang wrote: > Hi Gordon & Seth, this looks like a very useful feature for analyze and > manage states. > I agree that using DataSet is probably the most practical choice right > now. But in the longer adding the TableAPI support for this will be nice. >

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread vino yang
Hi Litree, >From an implementation level, the localKeyBy API returns a general KeyedStream, you can call all the APIs which KeyedStream provides, we did not restrict its usage, although we can do this (for example returns a new stream object named LocalKeyedStream). However, to achieve the goal

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread vino yang
Hi Dian, The different opinion is fine for me, If there is a better solution or there are obvious deficiencies in our design, we are very happy to accept and improve it. I agree with you that customized local aggregate operator is more scalable in the way of the trigger mechanism. However, I

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-04 Thread Shaoxuan Wang
Stavros, They have the similar logic concept, but the implementation details are quite different. It is hard to migrate the interface with different implementations. The built-in algorithms are useful legacy that we will consider migrate to the new API (but still with different implementations).

[jira] [Created] (FLINK-12733) Expose Rest API for TERMINATE/SUSPEND Job with Checkpoint

2019-06-04 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-12733: - Summary: Expose Rest API for TERMINATE/SUSPEND Job with Checkpoint Key: FLINK-12733 URL: https://issues.apache.org/jira/browse/FLINK-12733 Project:

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Xiaowei Jiang
Hi Gordon & Seth, this looks like a very useful feature for analyze and manage states.  I agree that using DataSet is probably the most practical choice right now. But in the longer adding the TableAPI support for this will be nice. When analyzing the savepoint, I assume that the state backend

[jira] [Created] (FLINK-12732) Add savepoint reader for consuming partitioned operator state

2019-06-04 Thread Seth Wiesman (JIRA)
Seth Wiesman created FLINK-12732: Summary: Add savepoint reader for consuming partitioned operator state Key: FLINK-12732 URL: https://issues.apache.org/jira/browse/FLINK-12732 Project: Flink

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread litree
Hi Vino, I have read your design,something I want to know is the usage of these new APIs.It looks like when I use localByKey,i must then use a window operator to return a datastream,and then use keyby and another window operator to get the final result? thanks, Litree On 06/04/2019 17:22,

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu
Hi Vino, It may seem similar to window operator but there are also a few key differences. For example, the local aggregate operator can send out the results at any time and the window operator can only send out the results at the end of window (without early fire). This means that the local

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Aljoscha Krettek
+1 I think is is a very valuable new additional and we should try and not get stuck on trying to design the perfect solution for everything > On 4. Jun 2019, at 13:24, Tzu-Li (Gordon) Tai wrote: > > +1 to renaming it as State Processing API and adding it under the > flink-libraries module. >

[jira] [Created] (FLINK-12731) Load shuffle service implementations from plugin manager

2019-06-04 Thread Andrey Zagrebin (JIRA)
Andrey Zagrebin created FLINK-12731: --- Summary: Load shuffle service implementations from plugin manager Key: FLINK-12731 URL: https://issues.apache.org/jira/browse/FLINK-12731 Project: Flink

[jira] [Created] (FLINK-12730) Combine BitSet implementations in flink-runtime

2019-06-04 Thread Liya Fan (JIRA)
Liya Fan created FLINK-12730: Summary: Combine BitSet implementations in flink-runtime Key: FLINK-12730 URL: https://issues.apache.org/jira/browse/FLINK-12730 Project: Flink Issue Type:

[jira] [Created] (FLINK-12729) Add savepoint reader for consuming non-partitioned operator state

2019-06-04 Thread Seth Wiesman (JIRA)
Seth Wiesman created FLINK-12729: Summary: Add savepoint reader for consuming non-partitioned operator state Key: FLINK-12729 URL: https://issues.apache.org/jira/browse/FLINK-12729 Project: Flink

Re: What does flink session mean ?

2019-06-04 Thread Till Rohrmann
Yes, interactive programming solves the problem by storing the meta information on the client whereas in the past we thought whether to keep the information on the JM. But this would then not allow to share results between different clusters. Thus, the interactive programming approach is a bit

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Tzu-Li (Gordon) Tai
+1 to renaming it as State Processing API and adding it under the flink-libraries module. I also think we can start with the development of the feature. From the feedback so far, it seems like we're in a good spot to add in at least the initial version of this API, hopefully making it ready for

Re: [Discuss] FLIP-43: Savepoint Connector

2019-06-04 Thread Seth Wiesman
It seems like a recurring piece of feedback was a different name. I’d like to propose moving the functionality to the libraries module and naming this the State Processing API. Seth > On May 31, 2019, at 3:47 PM, Seth Wiesman wrote: > > The SavepointOutputFormat only writes out the

Re: Flink internals

2019-06-04 Thread Piotr Nowojski
Hi, You can also read the FLIP proposals. Unluckily, one that is very internal [1] about credit based flow control [2] was not published as an official FLIP :( Regarding network stack and some of the other topics, there are some blogs/Flink Forward talks as well. Piotrek [1]

[jira] [Created] (FLINK-12728) taskmanager container can't launch on nodemanager machine because of kerberos

2019-06-04 Thread wgcn (JIRA)
wgcn created FLINK-12728: Summary: taskmanager container can't launch on nodemanager machine because of kerberos Key: FLINK-12728 URL: https://issues.apache.org/jira/browse/FLINK-12728 Project: Flink

Re: Flink internals

2019-06-04 Thread John Tipper
Hi Till, Fan & Rong, Thanks for your feedback - I'd seen the Flink internal page but sadly, as Rong pointed out, it's pretty limited and not maintained. I'll ask on the mailing lists, but I think it would be really helpful if there were a guide for Flink developers who want to contribute to

[jira] [Created] (FLINK-12727) Make HiveTableOutputFormat support writing partitioned tables

2019-06-04 Thread Rui Li (JIRA)
Rui Li created FLINK-12727: -- Summary: Make HiveTableOutputFormat support writing partitioned tables Key: FLINK-12727 URL: https://issues.apache.org/jira/browse/FLINK-12727 Project: Flink Issue

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread vino yang
Hi Dian, Thanks for your reply. I know what you mean. However, if you think deeply, you will find your implementation need to provide an operator which looks like a window operator. You need to use state and receive aggregation function and specify the trigger time. It looks like a lightweight

Re: What does flink session mean ?

2019-06-04 Thread Jeff Zhang
Thanks for the reply, @Till Rohrmann . Regarding reuse computed results. I think JM keep all the metadata of intermediate data, and interactive programming is also trying to reuse computed results. It looks like it may not be necessary to introduce the session concept as long as we can achieve

[jira] [Created] (FLINK-12726) Fix ANY type serialization

2019-06-04 Thread Timo Walther (JIRA)
Timo Walther created FLINK-12726: Summary: Fix ANY type serialization Key: FLINK-12726 URL: https://issues.apache.org/jira/browse/FLINK-12726 Project: Flink Issue Type: Sub-task

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Piotr Nowojski
Hi Vino, > So if users want to use local aggregation, they should call the window API > to build a local window that means users should (or say "can") specify the > window length and other information based on their needs. It sounds ok for me. It would have to be run against some API guys from

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu
Hi Vino, Thanks a lot for starting this discussion. +1 to this feature as I think it will be very useful. Regarding to using window to buffer the input elements, personally I don't think it's a good solution for the following reasons: 1) As we know that WindowOperator will store the

Re: Flink internals

2019-06-04 Thread Till Rohrmann
Hi John, unfortunately, there are no really good and up to date documents about Flink's internals. There was some discussion about updating the internals [1] but the community did decide against submitting it as a season of docs project. I agree that we should update our documentation about

Re: What does flink session mean ?

2019-06-04 Thread Till Rohrmann
Hi Jeff, the session functionality which you find in Flink's client are the remnants of an uncompleted feature which was abandoned. The idea was that one could submit multiple parts of a job to the same cluster where these parts are added to the same ExecutionGraph. That way we wanted to allow to

Re: [DISCUSS] Features for Apache Flink 1.9.0

2019-06-04 Thread Till Rohrmann
Thanks for starting this discussion Gordon and Kurt. For the development threads I'm involved with here are the updates: * Pluggable scheduler: Good part of the work is completed. Gary now works on the glue code to use the new high level scheduler components. The estimate to finish this work is

[jira] [Created] (FLINK-12725) Need to copy flink-hadoop-compatibility jar explicitly to ${FLINK-HOME}/lib location

2019-06-04 Thread arganzheng (JIRA)
arganzheng created FLINK-12725: -- Summary: Need to copy flink-hadoop-compatibility jar explicitly to ${FLINK-HOME}/lib location Key: FLINK-12725 URL: https://issues.apache.org/jira/browse/FLINK-12725

[jira] [Created] (FLINK-12724) Add Links to new Concepts Section to Glossary

2019-06-04 Thread Konstantin Knauf (JIRA)
Konstantin Knauf created FLINK-12724: Summary: Add Links to new Concepts Section to Glossary Key: FLINK-12724 URL: https://issues.apache.org/jira/browse/FLINK-12724 Project: Flink Issue