Re: [DISCUSS] Introduce SupportsParallelismReport and SupportsStatisticsReport for Hive and Filesystem

2020-07-30 Thread Jingsong Li
Hi, thanks for your responses. To Benchao: Glad to see your works and requirements, they should be Public. To Kurt: 1.Regarding "SupportsXXX" for ScanTableSource or LookupTableSource or DynamicTableSink, I don't think a "SupportsXXX" must work with all these three types. As Godfrey said, Such

[jira] [Created] (FLINK-18778) Support the SupportsProjectionPushDown interface for LookupTableSource

2020-07-30 Thread Jark Wu (Jira)
Jark Wu created FLINK-18778: --- Summary: Support the SupportsProjectionPushDown interface for LookupTableSource Key: FLINK-18778 URL: https://issues.apache.org/jira/browse/FLINK-18778 Project: Flink

[jira] [Created] (FLINK-18779) Support the SupportsFilterPushDown interface for ScanTableSource.

2020-07-30 Thread Jark Wu (Jira)
Jark Wu created FLINK-18779: --- Summary: Support the SupportsFilterPushDown interface for ScanTableSource. Key: FLINK-18779 URL: https://issues.apache.org/jira/browse/FLINK-18779 Project: Flink

[jira] [Created] (FLINK-18777) Supports schema registry catalog

2020-07-30 Thread Danny Chen (Jira)
Danny Chen created FLINK-18777: -- Summary: Supports schema registry catalog Key: FLINK-18777 URL: https://issues.apache.org/jira/browse/FLINK-18777 Project: Flink Issue Type: Bug

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-30 Thread Xingbo Huang
Hi Jincheng, Thanks a lot for bringing up this discussion and the proposal. Big +1 for improving the structure of PyFlink doc. It will be very friendly to give PyFlink users a unified entrance to learn PyFlink documents. Best, Xingbo Dian Fu 于2020年7月31日周五 上午11:00写道: > Hi Jincheng, > >

Re: Kinesis Performance Issue (was [VOTE] Release 1.11.0, release candidate #4)

2020-07-30 Thread Thomas Weise
I run git bisect and the first commit that shows the regression is: https://github.com/apache/flink/commit/355184d69a8519d29937725c8d85e8465d7e3a90 On Thu, Jul 23, 2020 at 6:46 PM Kurt Young wrote: > From my experience, java profilers are sometimes not accurate enough to > find out the

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-30 Thread Dian Fu
Hi Jincheng, Thanks a lot for bringing up this discussion and the proposal. +1 to improve the Python API doc. I have received many feedbacks from PyFlink beginners about the PyFlink doc, e.g. the materials are too few, the Python doc is mixed with the Java doc and it's not easy to find the

[DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-30 Thread jincheng sun
Hi folks, Since the release of Flink 1.11, users of PyFlink have continued to grow. As far as I know there are many companies have used PyFlink for data analysis, operation and maintenance monitoring business has been put into production(Such as 聚美优品[1](Jumei), 浙江墨芷[2] (Mozhi) etc.). According

[jira] [Created] (FLINK-18776) "compile_cron_scala212" failed to compile

2020-07-30 Thread Dian Fu (Jira)
Dian Fu created FLINK-18776: --- Summary: "compile_cron_scala212" failed to compile Key: FLINK-18776 URL: https://issues.apache.org/jira/browse/FLINK-18776 Project: Flink Issue Type: Bug Affects

[jira] [Created] (FLINK-18775) Rework PyFlink Documentation

2020-07-30 Thread sunjincheng (Jira)
sunjincheng created FLINK-18775: --- Summary: Rework PyFlink Documentation Key: FLINK-18775 URL: https://issues.apache.org/jira/browse/FLINK-18775 Project: Flink Issue Type: Improvement

Re: [DISCUSS] FLIP-132: Temporal Table DDL

2020-07-30 Thread Seth Wiesman
Hi Leondard, Thank you for pushing this, I think the updated syntax looks really good and the semantics make sense to me. +1 Seth On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu wrote: > Hi, Konstantin > > > > > 1) A "Versioned Temporal Table DDL on source" can only be joined on the > > PRIMARY

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Aljoscha Krettek
I see, we actually have some thoughts along that line as well. We have ideas about adding such functionality for `Transformation`, which is the graph structure that underlies both the DataStream API and the newer Table API Runner/Planner. There a very rough PoC for that available at [1]. It's

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Flavio Pompermaier
We use runCustomOperation to group a set of operators and into a single functional unit, just to make the code more modular.. It's very comfortable indeed. On Thu, Jul 30, 2020 at 5:20 PM Aljoscha Krettek wrote: > That is good input! I was not aware that anyone was actually using >

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Aljoscha Krettek
That is good input! I was not aware that anyone was actually using `runCustomOperation()`. Out of curiosity, what are you using that for? We have definitely thought about the first two points you mentioned, though. Especially processing-time will make it tricky to define unified execution

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Flavio Pompermaier
I just wanted to be propositive about missing api.. :D On Thu, Jul 30, 2020 at 4:29 PM Seth Wiesman wrote: > +1 Its time to drop DataSet > > Flavio, those issues are expected. This FLIP isn't just to drop DataSet > but to also add the necessary enhancements to DataStream such that it works >

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Seth Wiesman
+1 Its time to drop DataSet Flavio, those issues are expected. This FLIP isn't just to drop DataSet but to also add the necessary enhancements to DataStream such that it works well on bounded input. On Thu, Jul 30, 2020 at 8:49 AM Flavio Pompermaier wrote: > Just to contribute to the

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Flavio Pompermaier
Just to contribute to the discussion, when we tried to do the migration we faced some problems that could make migration quite difficult. 1 - It's difficult to test because of https://issues.apache.org/jira/browse/FLINK-18647 2 - missing mapPartition 3 - missing DataSet

[jira] [Created] (FLINK-18774) Support debezium-avro format

2020-07-30 Thread Jark Wu (Jira)
Jark Wu created FLINK-18774: --- Summary: Support debezium-avro format Key: FLINK-18774 URL: https://issues.apache.org/jira/browse/FLINK-18774 Project: Flink Issue Type: New Feature

[jira] [Created] (FLINK-18773) Enable parallel classloading

2020-07-30 Thread Arvid Heise (Jira)
Arvid Heise created FLINK-18773: --- Summary: Enable parallel classloading Key: FLINK-18773 URL: https://issues.apache.org/jira/browse/FLINK-18773 Project: Flink Issue Type: Improvement

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Till Rohrmann
+1 for this effort. Great to see that we are making progress towards our goal of a truly unified batch and stream processing engine. Cheers, Till On Thu, Jul 30, 2020 at 2:28 PM Kurt Young wrote: > +1, looking forward to the follow up FLIPs. > > Best, > Kurt > > > On Thu, Jul 30, 2020 at 6:40

Re: [DISCUSS] Introduce SupportsParallelismReport and SupportsStatisticsReport for Hive and Filesystem

2020-07-30 Thread godfrey he
Thanks Jingsong for bringing up this discussion, and thanks Kurt for the detailed thoughts. First of all, I also think it's a very useful feature to expose more ability for table source. 1) If we want to support [1], it's seem that SupportsParallelismReport does not meet the requirement: If

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Kurt Young
+1, looking forward to the follow up FLIPs. Best, Kurt On Thu, Jul 30, 2020 at 6:40 PM Arvid Heise wrote: > +1 of getting rid of the DataSet API. Is DataStream#iterate already > superseding DataSet iterations or would that also need to be accounted for? > > In general, all surviving APIs

[jira] [Created] (FLINK-18772) Hide submit job web ui elements when running in per-job/application mode

2020-07-30 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-18772: - Summary: Hide submit job web ui elements when running in per-job/application mode Key: FLINK-18772 URL: https://issues.apache.org/jira/browse/FLINK-18772 Project:

[jira] [Created] (FLINK-18771) "Kerberized YARN per-job on Docker test" failed with "Client cannot authenticate via:[TOKEN, KERBEROS]"

2020-07-30 Thread Dian Fu (Jira)
Dian Fu created FLINK-18771: --- Summary: "Kerberized YARN per-job on Docker test" failed with "Client cannot authenticate via:[TOKEN, KERBEROS]" Key: FLINK-18771 URL: https://issues.apache.org/jira/browse/FLINK-18771

[jira] [Created] (FLINK-18770) Emitting element fails in KryoSerializer

2020-07-30 Thread Leonid Ilyevsky (Jira)
Leonid Ilyevsky created FLINK-18770: --- Summary: Emitting element fails in KryoSerializer Key: FLINK-18770 URL: https://issues.apache.org/jira/browse/FLINK-18770 Project: Flink Issue Type:

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Arvid Heise
+1 of getting rid of the DataSet API. Is DataStream#iterate already superseding DataSet iterations or would that also need to be accounted for? In general, all surviving APIs should also offer a smooth experience for switching back and forth. On Thu, Jul 30, 2020 at 9:39 AM Márton Balassi

[jira] [Created] (FLINK-18769) Streaming Table job stuck when enabling minibatching

2020-07-30 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-18769: --- Summary: Streaming Table job stuck when enabling minibatching Key: FLINK-18769 URL: https://issues.apache.org/jira/browse/FLINK-18769 Project: Flink Issue

[jira] [Created] (FLINK-18768) Imporve SQL kafka connector docs about passing kafka properties

2020-07-30 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-18768: -- Summary: Imporve SQL kafka connector docs about passing kafka properties Key: FLINK-18768 URL: https://issues.apache.org/jira/browse/FLINK-18768 Project: Flink

[jira] [Created] (FLINK-18767) Streaming job stuck when disabling operator chaining

2020-07-30 Thread Nico Kruber (Jira)
Nico Kruber created FLINK-18767: --- Summary: Streaming job stuck when disabling operator chaining Key: FLINK-18767 URL: https://issues.apache.org/jira/browse/FLINK-18767 Project: Flink Issue

Re: Checkpointing under backpressure

2020-07-30 Thread Arvid Heise
Dear all, I just wanted to follow-up on this long discussion thread by announcing that we implemented unaligned checkpoints in Flink 1.11. If you experience long end-to-end checkpointing duration, you should try out unaligned checkpoints [1] if the following applies: - Checkpointing is not

Re: [DISCUSS] Introduce SupportsParallelismReport and SupportsStatisticsReport for Hive and Filesystem

2020-07-30 Thread Kurt Young
Hi Jingsong, Thanks for bringing up this discussion. In general, I'm +1 to enrich the source ability by the parallelism and stats reporting, but I'm not sure whether introducing such "Supports" interface is a good idea. I will share my thoughts separately. 1) Regarding the interface

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Márton Balassi
Hi All, Thanks for the write up and starting the discussion. I am in favor of unifying the APIs the way described in the FLIP and deprecating the DataSet API. I am looking forward to the detailed discussion of the changes necessary. Best, Marton On Wed, Jul 29, 2020 at 12:46 PM Aljoscha Krettek

[jira] [Created] (FLINK-18766) Support add_sink() for Python DataStream API

2020-07-30 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18766: --- Summary: Support add_sink() for Python DataStream API Key: FLINK-18766 URL: https://issues.apache.org/jira/browse/FLINK-18766 Project: Flink Issue Type:

[jira] [Created] (FLINK-18765) Support map() and flat_map() for Python DataStream API

2020-07-30 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18765: --- Summary: Support map() and flat_map() for Python DataStream API Key: FLINK-18765 URL: https://issues.apache.org/jira/browse/FLINK-18765 Project: Flink Issue

[jira] [Created] (FLINK-18764) Support from_collection for Python DataStream API

2020-07-30 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18764: --- Summary: Support from_collection for Python DataStream API Key: FLINK-18764 URL: https://issues.apache.org/jira/browse/FLINK-18764 Project: Flink Issue Type:

[jira] [Created] (FLINK-18763) Support basic TypeInformation for Python DataStream API

2020-07-30 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18763: --- Summary: Support basic TypeInformation for Python DataStream API Key: FLINK-18763 URL: https://issues.apache.org/jira/browse/FLINK-18763 Project: Flink Issue

[jira] [Created] (FLINK-18762) Make network buffers per incoming/outgoing channel can be configured separately

2020-07-30 Thread Yingjie Cao (Jira)
Yingjie Cao created FLINK-18762: --- Summary: Make network buffers per incoming/outgoing channel can be configured separately Key: FLINK-18762 URL: https://issues.apache.org/jira/browse/FLINK-18762

[jira] [Created] (FLINK-18761) Support Python DataStream API (Stateless part)

2020-07-30 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-18761: --- Summary: Support Python DataStream API (Stateless part) Key: FLINK-18761 URL: https://issues.apache.org/jira/browse/FLINK-18761 Project: Flink Issue Type: New

Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

2020-07-30 Thread Xuannan Su
Hi folks, It seems that all the raised concerns so far have been resolved. I plan to start a voting thread for FLIP-36 early next week if there are no comments. Thanks, Xuannan On Jul 28, 2020, 7:42 PM +0800, Xuannan Su , wrote: > Hi Kurt, > > Thanks for the comments. > > You are right that the