Re: Re: Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2022-01-03 Thread Yun Gao
Hi Fabian, Very thanks for the explanation! Sorry that cascading commits might not be accurate and my initial concern is that if we have cases that the post-committer topology wants to process the committables after they get committed. But since we seems indeed have walkaround for that, thus I

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2022-01-03 Thread Xuannan Su
Hi Zhipeng and Gen, Thanks for joining the discussion. For Zhipeng: - Can we support side output Caching the side output is indeed a valid use case. However, with the current API, it is not straightforward to cache the side output. You can apply an identity map function to the DataStream

Re: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-01-03 Thread Martijn Visser
Hi Dong, Thanks for writing the FLIP. It focusses only on the KafkaSource, but I would expect that if such a functionality is desired, it should be made available for all unbounded sources (for example, Pulsar and Kinesis). If it's only available for Kafka, I see it as if we're increasing feature

Re: [DISCUSS] Drop Gelly

2022-01-03 Thread Martijn Visser
Hi Zhipeng, Good that you've reached out, I wasn't aware that Gelly is being used in Alink. Are you proposing to write a new graph library as a successor of Gelly and bundle that with Alink? Best regards, Martijn On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang wrote: > Hi everyone, > > Thanks for

[jira] [Created] (FLINK-25510) Add test cases for KafkaPartitionSplitReader

2022-01-03 Thread Zongwen Li (Jira)
Zongwen Li created FLINK-25510: -- Summary: Add test cases for KafkaPartitionSplitReader Key: FLINK-25510 URL: https://issues.apache.org/jira/browse/FLINK-25510 Project: Flink Issue Type:

[DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-01-03 Thread Dong Lin
Hi all, We created FLIP-208: Update KafkaSource to detect EOF based on de-serialized records. Please find the KIP wiki in the link https://cwiki.apache.org/confluence/display/FLINK/FLIP-208%3A+Update+KafkaSource+to+detect+EOF+based+on+de-serialized+records . This FLIP aims to address the

[jira] [Created] (FLINK-25509) FLIP-208: Update KafkaSource to detect EOF based on de-serialized records

2022-01-03 Thread Dong Lin (Jira)
Dong Lin created FLINK-25509: Summary: FLIP-208: Update KafkaSource to detect EOF based on de-serialized records Key: FLINK-25509 URL: https://issues.apache.org/jira/browse/FLINK-25509 Project: Flink

Re: [DISCUSS] FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP)

2022-01-03 Thread Becket Qin
Hi David, Thanks for sharing your thoughts. Some quick reply to your comments: We're still talking about the "web server based" > pattern_processor_discoverer, but what about other use cases? One of my big > concerns is that user's can not really reuse any part of the Flink > ecosystem to

[jira] [Created] (FLINK-25508) Flink Batch mode, cluster shutdown early.

2022-01-03 Thread todd (Jira)
todd created FLINK-25508: Summary: Flink Batch mode, cluster shutdown early. Key: FLINK-25508 URL: https://issues.apache.org/jira/browse/FLINK-25508 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-25507) ElasticsearchWriterITCase#testWriteOnBulkFlush test failed

2022-01-03 Thread ranqiqiang (Jira)
ranqiqiang created FLINK-25507: -- Summary: ElasticsearchWriterITCase#testWriteOnBulkFlush test failed Key: FLINK-25507 URL: https://issues.apache.org/jira/browse/FLINK-25507 Project: Flink

Re: [DISCUSS] Drop Gelly

2022-01-03 Thread Zhipeng Zhang
Hi everyone, Thanks for starting the discussion :) We (Alink team [1]) are actually using part of the Gelly library to support graph algorithms (connected component, single source shortest path, etc.) for users in Alibaba Inc. As DataSet API is going to be dropped, shall we also provide a new

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2022-01-03 Thread Xintong Song
+1 Thanks for driving this, David. Thank you~ Xintong Song On Tue, Jan 4, 2022 at 4:28 AM Thomas Weise wrote: > +1 for bumping minimum supported Hadoop version to 2.8.5 > > On Mon, Jan 3, 2022 at 12:25 AM David Morávek wrote: > > > > As there were no strong objections, we'll proceed with

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2022-01-03 Thread Thomas Weise
+1 for bumping minimum supported Hadoop version to 2.8.5 On Mon, Jan 3, 2022 at 12:25 AM David Morávek wrote: > > As there were no strong objections, we'll proceed with bumping the Hadoop > version to 2.8.5 and removing the safeguards and the CI for any earlier > versions. This will effectively

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-03 Thread Thomas Weise
Interesting discussion. It caught my attention because I was also interested in the Beam fn execution overhead a few years ago. We found back then that while in theory the fn protocol overhead is very significant, for realistic function workloads that overhead was negligible. And of course it all

Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #3

2022-01-03 Thread Dong Lin
Thank you Till for the vote. FYI, if you were not able to use `pip3 install` to verify the python source release, that might be because you are using Python 3.9 (or later versions) with pip3. The issue could be fixed by using e.g. Python 3.8. The supported python versions are documented in

[jira] [Created] (FLINK-25506) Generate HBase Connector Options doc from HBaseConnectorOptions

2022-01-03 Thread Jing Ge (Jira)
Jing Ge created FLINK-25506: --- Summary: Generate HBase Connector Options doc from HBaseConnectorOptions Key: FLINK-25506 URL: https://issues.apache.org/jira/browse/FLINK-25506 Project: Flink Issue

Re: [DISCUSS] Drop Gelly

2022-01-03 Thread David Anderson
Most of the inquiries I've had about Gelly in recent memory have been from folks looking for a streaming solution, and it's only been a handful. +1 for dropping Gelly David On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann wrote: > I haven't seen any changes or requests to/for Gelly in ages.

[DISCUSS] Looking for maintainers for Google PubSub connector or discuss next step

2022-01-03 Thread Martijn Visser
Hi everyone, We're looking for community members, who would like to maintain Flink's Google PubSub connector [1] going forward. There are multiple improvement tickets open and the original contributors are currently unable to work on further improvements. An overview of some of the open tickets:

Re: [VOTE] Apache Flink ML Release 2.0.0, release candidate #3

2022-01-03 Thread Till Rohrmann
+1 (binding) - Checked the checksums and signatures - Built java part from source release - Ran all Java tests - Checked the blog post PR What I did not manage to do is to build the Python part locally. I assume that this was due to my local Python setup. Maybe somebody else can double check

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-03 Thread Till Rohrmann
One more question that came to my mind: How much performance improvement do we gain on a real-world Python use case? Were the measurements more like micro benchmarks where the Python UDF was called w/o the overhead of Flink? I would just be curious how much the Python component contributes to the

Re: [DISCUSS] Drop Gelly

2022-01-03 Thread Till Rohrmann
I haven't seen any changes or requests to/for Gelly in ages. Hence, I would assume that it is not really used and can be removed. +1 for dropping Gelly. Cheers, Till On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser wrote: > Hi everyone, > > Flink is bundled with Gelly, a Graph API library [1].

[DISCUSS] Drop Gelly

2022-01-03 Thread Martijn Visser
Hi everyone, Flink is bundled with Gelly, a Graph API library [1]. This has been marked as approaching end-of-life for quite some time [2]. Gelly is built on top of Flink's DataSet API, which is deprecated and slowly being phased out [3]. It only works on batch jobs. Based on the activity in the

[jira] [Created] (FLINK-25505) Fix NetworkBufferPoolTest, SystemResourcesCounterTest on Apple M1

2022-01-03 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-25505: -- Summary: Fix NetworkBufferPoolTest, SystemResourcesCounterTest on Apple M1 Key: FLINK-25505 URL: https://issues.apache.org/jira/browse/FLINK-25505 Project:

Re: [DISCUSS] FLIP-206: Support PyFlink Runtime Execution in Thread Mode

2022-01-03 Thread Till Rohrmann
Hi Xingbo, Thanks for creating this FLIP. I have two general questions about the motivation for this FLIP because I have only very little exposure to our Python users: Is the slower performance currently the biggest pain point for our Python users? What else are our Python users mainly

Re: Re: [DISCUSS] Introduce Hash Lookup Join

2022-01-03 Thread Francesco Guardiani
Hi Jing, Thanks for the FLIP. I'm not very knowledgeable about the topic, but going through both the FLIP and the discussion here, I wonder, does it makes sense for a lookup join to use hash distribution whenever is possible by default? The point you're explaining here: > Many Lookup table

[jira] [Created] (FLINK-25504) Update and synchronise used versions of Kafka Client and Confluent Platform

2022-01-03 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-25504: -- Summary: Update and synchronise used versions of Kafka Client and Confluent Platform Key: FLINK-25504 URL: https://issues.apache.org/jira/browse/FLINK-25504

[jira] [Created] (FLINK-25503) KafkaDynamicSource: can't use group-offsets startup mode directly

2022-01-03 Thread Zongwen Li (Jira)
Zongwen Li created FLINK-25503: -- Summary: KafkaDynamicSource: can't use group-offsets startup mode directly Key: FLINK-25503 URL: https://issues.apache.org/jira/browse/FLINK-25503 Project: Flink

Re: Re: [DISCUSS] FLIP-191: Extend unified Sink interface to support small file compaction

2022-01-03 Thread Fabian Paul
Hi Yun, Thanks for summarizing the two issues. 1. Losing intermediate shuffle data in batch mode I fully agree with your analysis. We will start to mitigate the problem by introducing the blocking exchanges and documenting that it will not prevent duplication in case a complete taskmanager is

Re: [DISCUSS] Changing the minimal supported version of Hadoop

2022-01-03 Thread David Morávek
As there were no strong objections, we'll proceed with bumping the Hadoop version to 2.8.5 and removing the safeguards and the CI for any earlier versions. This will effectively make the Hadoop 2.8.5 the least supported version in Flink 1.15. Best, D. On Thu, Dec 23, 2021 at 11:03 AM Till

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2022-01-03 Thread David Morávek
One more question from my side, should we make sure this plays well with the remote shuffle service [1] in case of TM failure? [1] https://github.com/flink-extended/flink-remote-shuffle D. On Thu, Dec 30, 2021 at 11:59 AM Gen Luo wrote: > Hi Xuannan, > > I found FLIP-188[1] that is aiming to

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-03 Thread David Morávek
Hi Piotr, does this mean that we need to keep the checkpoints compatible across minor versions? Or can we say, that the minor version upgrades are only guaranteed with canonical savepoints? My concern is especially if we'd want to change layout of the checkpoint. D. On Wed, Dec 29, 2021 at