Something wrong with travis?

2019-06-17 Thread Kurt Young
Hi dev, I noticed that all the travis tests triggered by pull request are failed with the same error: "Cached flink dir /home/travis/flink_cache/x/flink does not exist. Exiting build." Anyone have a clue on what happened and how to fix this? Best, Kurt

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread Kurt Young
Yeah, sorry for not expressing myself clearly. I will try to provide more details to make sure we are on the same page. For DataStream API, it shouldn't be optimized automatically. You have to explicitly call API to do local aggregation as well as the trigger policy of the local aggregation. Take

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread Dawid Wysakowicz
Hi all, I think we are getting closer to a consensus. I think most of us already agree that the current behavior is broken. The remaining difference I see is that I think those problems are caused by the design of the split/select method. The current contract of the split method is that it is actu

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread vino yang
Hi Kurt, Thanks for your reply. Actually, I am not against you to raise your design. >From your description before, I just can imagine your high-level implementation is about SQL and the optimization is inner of the API. Is it automatically? how to give the configuration option about trigger pre

[jira] [Created] (FLINK-12879) Improve the performance of AbstractBinaryWriter

2019-06-17 Thread Liya Fan (JIRA)
Liya Fan created FLINK-12879: Summary: Improve the performance of AbstractBinaryWriter Key: FLINK-12879 URL: https://issues.apache.org/jira/browse/FLINK-12879 Project: Flink Issue Type: Improveme

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread Kurt Young
Hi Vino, Now I feel that we may have different understandings about what kind of problems or improvements you want to resolve. Currently, most of the feedback are focusing on *how to do a proper local aggregation to improve performance and maybe solving the data skew issue*. And my gut feeling is

[jira] [Created] (FLINK-12878) Add travis profile for flink-table-planner-blink/flink-table-runtime-blink

2019-06-17 Thread godfrey he (JIRA)
godfrey he created FLINK-12878: -- Summary: Add travis profile for flink-table-planner-blink/flink-table-runtime-blink Key: FLINK-12878 URL: https://issues.apache.org/jira/browse/FLINK-12878 Project: Flink

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread vino yang
Hi Kurt, Thanks for your comments. It seems we both implemented local aggregation feature to optimize the issue of data skew. However, IMHO, the API level of optimizing revenue is different. *Your optimization benefits from Flink SQL and it's not user's faces.(If I understand it incorrectly, ple

[jira] [Created] (FLINK-12877) Unify catalog database implementations and remove CatalogDatabase interfaces

2019-06-17 Thread Bowen Li (JIRA)
Bowen Li created FLINK-12877: Summary: Unify catalog database implementations and remove CatalogDatabase interfaces Key: FLINK-12877 URL: https://issues.apache.org/jira/browse/FLINK-12877 Project: Flink

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread vino yang
Hi Jark, We have done a comparative test. The effect is obvious. >From our observation, the optimized effect mainly depends on two factors: - the degree of the skew: this factor depends on users business ; - the size of the window: localKeyBy support all the type of window which provid

[jira] [Created] (FLINK-12876) Adapt region failover NG for legacy scheduler

2019-06-17 Thread Zhu Zhu (JIRA)
Zhu Zhu created FLINK-12876: --- Summary: Adapt region failover NG for legacy scheduler Key: FLINK-12876 URL: https://issues.apache.org/jira/browse/FLINK-12876 Project: Flink Issue Type: Sub-task

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread Dian Fu
Hi all, Thanks a lot for the discussion. I'm also in favor of rewriting/redesigning the split/select API instead of removing them. It has been a consensus that the side output API can achieve all the functionalities of the split/select API. The problem is whether we should also support some eas

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread Kurt Young
Hi Vino, Thanks for the proposal, I like the general idea and IMO it's very useful feature. But after reading through the document, I feel that we may over design the required operator for proper local aggregation. The main reason is we want to have a clear definition and behavior about the "local

[jira] [Created] (FLINK-12875) support char, varchar, timestamp, date, decimal in input arg conversion for Hive functions

2019-06-17 Thread Bowen Li (JIRA)
Bowen Li created FLINK-12875: Summary: support char, varchar, timestamp, date, decimal in input arg conversion for Hive functions Key: FLINK-12875 URL: https://issues.apache.org/jira/browse/FLINK-12875 Pr

Re: [ANNOUNCEMENT] March 2019 Bay Area Apache Flink Meetup

2019-06-17 Thread Xuefu Zhang
Hi all, The scheduled meetup is only about a week away. Please note that RSVP at meetup.com is required. In order for us to get the actual headcount to prepare for the event, please sign up as soon as possible if you plan to join. Thank you very much for your cooperation. Regards, Xuefu On Thu,

RE: About Deprecating split/select for DataStream API

2019-06-17 Thread xingcanc
Hi all, Thanks for sharing your thoughts on this topic. First, we must admit that the current implementation for split/select is flawed. I roughly went through the source codes, the problem may be that for consecutive select/split(s), the former one will be overridden by the later one during S

[jira] [Created] (FLINK-12874) Improve the semantics of zero length character strings

2019-06-17 Thread Timo Walther (JIRA)
Timo Walther created FLINK-12874: Summary: Improve the semantics of zero length character strings Key: FLINK-12874 URL: https://issues.apache.org/jira/browse/FLINK-12874 Project: Flink Issue

Re: [VOTE] FLIP-41: Unified binary format for keyed state

2019-06-17 Thread Aljoscha Krettek
+1 With the restriction that it should be “canonical format”/“unified format” (or something like it) and not save point format, i.e. not KeyedBackendSavepointStrategyBase in the doc, for example Aljoscha > On 17. Jun 2019, at 14:05, Congxian Qiu wrote: > > +1 from my side. > Best, > Congxian

Re: Sort streams in windows

2019-06-17 Thread Евгений Юшин
Hi Jan Thanks for a quick reply. Doing stateful transformation requires re-writing the same logic which is already defined in Flink by itself. Let's consider example from my original message: There can be out-of-order data -> data should be propagated to next operator only when watermark crosses

[jira] [Created] (FLINK-12873) Create a separate maven module for Shuffle API

2019-06-17 Thread Andrey Zagrebin (JIRA)
Andrey Zagrebin created FLINK-12873: --- Summary: Create a separate maven module for Shuffle API Key: FLINK-12873 URL: https://issues.apache.org/jira/browse/FLINK-12873 Project: Flink Issue Ty

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread SHI Xiaogang
Hi Dawid, Thanks a lot for your example. I think most users will expect splitted1 to be empty in the example. The unexpected results produced, in my opinion, is due to our problematic implementation, instead of the confusing semantics. We can fix the problem if we add a SELECT operator to filter

Re: [VOTE] FLIP-41: Unified binary format for keyed state

2019-06-17 Thread Congxian Qiu
+1 from my side. Best, Congxian Tzu-Li (Gordon) Tai 于2019年6月17日周一 上午10:20写道: > Hi Flink devs, > > I want to officially start a voting thread to formally adopt FLIP-41 [1]. > > There are two relevant discussions threads for this feature [2] [3]. > > The voting time will end on June 19th 17:00 CE

[jira] [Created] (FLINK-12872) WindowOperator may fail with UnsupportedOperationException when merging windows

2019-06-17 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-12872: -- Summary: WindowOperator may fail with UnsupportedOperationException when merging windows Key: FLINK-12872 URL: https://issues.apache.org/jira/browse/FLINK-12872 P

[jira] [Created] (FLINK-12871) Wrong SSL setup examples in docs

2019-06-17 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-12871: --- Summary: Wrong SSL setup examples in docs Key: FLINK-12871 URL: https://issues.apache.org/jira/browse/FLINK-12871 Project: Flink Issue Type: Bug Comp

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread Jark Wu
Hi Vino, Thanks for the proposal. Regarding to the "input.keyBy(0).sum(1)" vs "input.localKeyBy(0).countWindow(5).sum(1).keyBy(0).sum(1)", have you done some benchmark? Because I'm curious about how much performance improvement can we get by using count window as the local operator. Best, Jark

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-06-17 Thread SHI Xiaogang
Hi Jeff and Flavio, Thanks Jeff a lot for proposing the design document. We are also working on refactoring ClusterClient to allow flexible and efficient job management in our real-time platform. We would like to draft a document to share our ideas with you. I think it's a good idea to have some

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread Dawid Wysakowicz
Yes you are correct. The problem I described applies to the split not select as I wrote in the first email. Sorry for that. I will try to prepare a correct example. Let's have a look at this example:     val splitted1 = ds.split(if (1) then "a")     val splitted2 = ds.split(if (!=1) then "a") I

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-06-17 Thread Flavio Pompermaier
Is there any possibility to have something like Apache Livy [1] also for Flink in the future? [1] https://livy.apache.org/ On Tue, Jun 11, 2019 at 5:23 PM Jeff Zhang wrote: > >>> Any API we expose should not have dependencies on the runtime > (flink-runtime) package or other implementation det

RE: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

2019-06-17 Thread Visser, M.J.H. (Martijn)
On a related subject, it would be interesting to have the capability to encrypt savepoints. That would allow processing and storing of sensitive data in Flink. -Original Message- From: Tzu-Li (Gordon) Tai Sent: maandag 17 juni 2019 04:15 To: dev Subject: Re: [DISCUSS] FLIP-41: Unify K

Re: Unit tests consuming a lot of disk

2019-06-17 Thread Piotr Nowojski
Good to know that you have solved your problem :) Piotrek > On 15 Jun 2019, at 00:48, Timothy Farkas wrote: > > Resolved the issue. I was running a very old version of macos, after > upgrading to Mojave the issue disappeared. Disk usage stopped spiking and I > stopped running out of disk space.

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-17 Thread Gen Luo
Hi all, In the review of PR for FLINK-12473, there were a few comments regarding pipeline exportation. We would like to start a follow up discussions to address some related comments. Currently, FLIP-39 proposal gives a way for users to persist a pipeline in JSON format. But it does not specify h

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread vino yang
Hi Hequn, Thanks for your reply. The purpose of localKeyBy API is to provide a tool which can let users do pre-aggregation in the local. The behavior of the pre-aggregation is similar to keyBy API. So the three cases are different, I will describe them one by one: 1. input.keyBy(0).sum(1) *In

Re: [ANNOUNCE] Weekly Community Update 2019/24

2019-06-17 Thread Konstantin Knauf
Hi Zili, thank you for adding these threads :) I would have otherwise picked them up next week, just couldn't put everything into one email. Cheers, Konstantin On Sun, Jun 16, 2019 at 11:07 PM Zili Chen wrote: > Hi Konstantin and all, > > Thank Konstantin very much for reviving this tradition

[jira] [Created] (FLINK-12870) Improve documentation of keys schema evolution

2019-06-17 Thread Alexander Fedulov (JIRA)
Alexander Fedulov created FLINK-12870: - Summary: Improve documentation of keys schema evolution Key: FLINK-12870 URL: https://issues.apache.org/jira/browse/FLINK-12870 Project: Flink Issu

Re: Sort streams in windows

2019-06-17 Thread Jan Lukavský
Hi Eugene, I'd say that what you want essentially is not "sort in windows", because (as you mention), you want to emit elements from windows as soon as watermark passes some timestamp. Maybe a better approach would be to implement this using stateful processing, where you keep a buffer of (un

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

2019-06-17 Thread Hequn Cheng
Hi Vino, Thanks for the proposal, I think it is a very good feature! One thing I want to make sure is the semantics for the `localKeyBy`. From the document, the `localKeyBy` API returns an instance of `KeyedStream` which can also perform sum(), so in this case, what's the semantics for `localKeyB

[jira] [Created] (FLINK-12869) Add yarn acls capability to flink containers

2019-06-17 Thread Nicolas Fraison (JIRA)
Nicolas Fraison created FLINK-12869: --- Summary: Add yarn acls capability to flink containers Key: FLINK-12869 URL: https://issues.apache.org/jira/browse/FLINK-12869 Project: Flink Issue Type

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread SHI Xiaogang
Hi Dawid, As the select method is only allowed on SplitStreams, it's impossible to construct the example ds.split().select("a", "b").select("c", "d"). Are you meaning ds.split().select("a", "b").split().select("c", "d")? If so, then the tagging in the first split operation should not affect the s

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread Dawid Wysakowicz
Hi all, Thank you for starting the discussion. To start with I have to say I am not entirely against leaving them. On the other hand I totally disagree that the semantics are clearly defined. Actually the design is fundamentally flawed. 1. We use String as a selector for elements. This is not th

[jira] [Created] (FLINK-12868) Yarn cluster can not be deployed if plugins dir does not exist

2019-06-17 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-12868: -- Summary: Yarn cluster can not be deployed if plugins dir does not exist Key: FLINK-12868 URL: https://issues.apache.org/jira/browse/FLINK-12868 Project: Flink